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PREFACE 


The Australian Bureau of Statistics (ABS) provides a high quality 
user-oriented and dynamic statistical service for all sectors of the 
community. The statistical information it makes available has an 
important role to play in the decision-making processes that are 
undertaken by governments, private businesses and individuals. 


Sample surveys are used extensively by the ABS as part of its role in 
collecting information for these decision-making processes. 


This publication is intended to be a basic and practical guide to the use 
of sample surveys for the purpose of conducting all types of research or 
information gathering. The chapters are structured in a logical 
sequence—from the aims and objectives of the survey through to the 
presentation of the results. The publication is by no means an exhaustive 
study of the theory and practice of sample surveys, both of which are 
further covered in numerous journals and books. 


Prudent users of statistics will realise that to run a good survey an 
essential ingredient is a healthy dose of common sense along with the 
application of sound statistical principles. Poorly run surveys can be 
costly in terms of the quality of data and operating costs. Data from a 
survey must be suitable for meaningful analysis and informative 
presentation. It is hoped that this publication will help people 
understand more about sample surveys and how to make better use of 
them. 


An Introduction to Sample Surveys draws on a high level of statistical 
expertise and many years of practical experience by ABS staff. The fact 
that this issue marks the fourth printing of this user guide (previously 
released as catalogue no. 1202.2) is indicative of the value of its 
contents. 


The ABS also provides a statistical consultancy and training service as 
part of its range of products and services. This service can provide advice 
on conducting surveys, questionnaire design, analysis of results, or any 
other aspect of conducting a sample survey. For assistance or advice in 
conducting a sample survey or any other aspect of statistical services, you 
can contact the ABS Statistical Consultancy Service—contact details are 
provided on the last page. 


W. McLennan 
Australian Statistician 
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ABBREVIATIONS AND SYMBOLS 


ABS Australian Bureau of Statistics 

CAI Computer-assisted interviewing 

CATI Computer-assisted telephone interviewing 
ICR Intelligent character recognition 

OCR Optical character recognition 


OMR Optical mark recognition 


PES Post-enumeration survey 
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CHAPTER 1 


INTRODUCTION 


REASONS FOR THE STUDY 


What is the population 
being studied? 


What do you want to know 
about this population? 


Is the information required 
on an ongoing basis? 


IS A SURVEY APPROPRIATE? 


AIMS AND OBJECTIVES 


A major aspect of any research is the gathering of information. Whether 
it will be used as the basis for decision-making, the allocation of funds, 
to analyse the outcome of policies or programs or to determine the 
direction of future operations, there are a number of factors which need 
to be considered before deciding to commission or undertake a survey. 
This chapter is intended to assist researchers and managers identify some 
of these factors, analyse their requirements and select the most 
appropriate method of collecting the necessary information. 


Before undertaking any research or study it is essential to define the 
purposes of the study and to translate these into specific information 
requirements. 


The first consideration is to define the target population. The target 
population must be an identifiable group which is relevant to the study. 
Examples of target populations are persons aged 65 and over, retail 
businesses in a specific urban area, and motor vehicles produced in a 
given year. 


Having defined the population or group under study the next step is to 
decide what information needs to be collected. For example, a study may 
be aimed at describing a target population in terms of specific 
characteristics such as age, sex, income or employment group, or may be 
far more subjective in nature, collecting information on background, 
community attitudes, opinions etc. 


If the requirement is for quantitative data the researcher needs to 
determine whether simple frequency counts are required (e.g. basic 
counts of responses in each cell or box on a questionnaire) or whether 
there is a need for detailed cross-classified tabulations (such as age by 
sex by country of birth). If it is expected that this type of cross-tabulation 
will be required it is important to specify, or define, as soon as possible 
the detailed tables that will need to be derived and this will determine 
both the content of the survey and the size of sample required. 


If a sample survey is used as the means of collecting the required 
information, there may be different requirements in terms of survey 
design and management if the data are to be collected on an ongoing 
basis compared to the requirements if the data are to be collected only 
once. Such decisions need to be made early to ensure that the eventual 
data collection process developed is the most effective. 


Although a sample survey may provide the data required for a study it is 
not always the most practical or efficient method to adopt. A number of 
alternative sources and methodologies, together with their relative 
advantages and disadvantages, are discussed below. 
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Existing data sources Having defined the target population and data requirements the next 
step is to find out whether the information needed is already available 
from another source. Possible data sources include: 


= Research of subject documentation. Some topics may be satisfactorily 
approached through systematic examination of existing documents, for 
example, academic journals, books, newspapers, information papers, 
government files etc. 


s Administrative by-product. A second major source of existing data is 
administrative by-product information, i.e. statistics kept by 
government departments and agencies as part of their ongoing 
operations. Examples of these agencies include the Registrar of Births, 
Deaths, and Marriages; the Department of Foreign Affairs and Trade; 
and the road traffic authorities. 


=» Previous surveys. These include surveys conducted in previous years, 
interstate or overseas, the results of which are usually available in 
tabulated form. In many cases additional data can be obtained 
through special tables generated from the survey’s unit record file or 
by other means by contacting the agency responsible for conducting 
the survey. 


The major advantages of existing data sources are that information can 
be obtained comparatively quickly and at relatively low cost. However, in 
many cases the data available may be only an approximation of what is 
required, and may not be current (i.e. many existing statistical collections 
have been run on a one-off or infrequent basis). The lack of a single 
source from which to obtain a range of related data may also lead to 
problems of comparability between data items. 


Non-survey methodologies Set out below are three alternatives to running a sample survey that may 
provide sufficient information to satisfy the researcher’s requirements: 


=» Focus groups. This technique will be discussed in chapter 4. The 
process involves bringing together a number of people who have 
similar identifiable characteristics to discuss the issues involved. The 
group is usually between 6 and 12 people. 


=» Controlled experiment. Used in the medical and scientific fields and 
usually involves selecting two groups of subjects as similar to each 
other as possible; one group is designated the experimental group, 
the other the control group. The experimental group is subjected to 
some form of change (experiment) while factors affecting the control 
group are kept constant. Data are collected from each group both 
before and after the experiment and any changes may then be 
measured. 


= Case study. In preparing a case study the researcher seeks to collect 
and analyse as much data about the chosen subject as possible from a 
relatively small number of cases. 


A detailed discussion of controlled experiment and case study is outside 
the scope of this user’s guide. 
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Non-survey methodologies 


continued 


Full enumeration (census) 


Sample surveys 


The main advantage of these methodologies is that they allow the 
researcher to collect quite detailed information for a comparatively 
modest cost. The major disadvantage, however, is the small number of 
participants studied, which may result in the data not being 
representative of the target population. 


Where resources permit, a complete enumeration of the population 
under study can overcome many of the disadvantages associated with 
sample surveys and yield reliable information, but at maximum cost. In a 
census the objective is to collect data in relation to every member of the 
population under study. The advantages include: 


= data will be truly representative of the whole population; 


= data are generally available at highly disaggregated levels, e.g. for 
small geographic areas or sub-sets of the population; thus detailed 
cross-tabulations are possible, e.g. age by sex by country of birth; and 


=» benchmark data may be obtained for future studies, e.g. a census of 
retail establishments may yield data on stocks, turnover or 
employment. This can then be used to determine a suitable sampling 
frame (or list of the members of a population or group) for future 
surveys, e.g. a survey of retail floorspace or of part-time/full-time 
employment. 


The main disadvantages of a census are: 


resource costs are large, both in staff and monetary terms; 


=» the number of questions asked has to be kept as small as possible, so 
as to minimise both the reporting burden on data providers and 
costs; 


= it may be difficult to approach all members of the population within a 
reasonable time; and 


= processing time is slow, so the results may become available too late 
to be useful. 


A full enumeration of the target population may be a very large project with 
associated logistical problems. This can lead to errors in the resultant data 
output that could be avoided in a smaller survey. Thus, a small survey 
conducted effectively may result in higher quality results than a full scale 
census where available resources can be stretched too far. 


In a sample survey, only a part of the total population is approached for 
information on the topic under study. These data are then ‘expanded’ or 
‘weighted’ to represent the target population as a whole. 


Advantages of sample surveys include: 


= resource costs are generally significantly lower than for a census; 
= more, or more detailed, questions can be asked; and 


= results can be available far more quickly. 
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Sample surveys continued The major disadvantages of sample surveys are: 


=» data may not be representative of the total population, particularly 
where the number of respondents is small; and 


» finely classified data (e.g. small area data) are generally not available. 


SUMMARY The first task to be undertaken in planning any research project is to 
define clearly the aims and objectives of the study. Knowing the target 
population, what information is required and how it is going to be used 
are essential if the researcher is to make an informed evaluation of the 
alternative methods available for collecting data. 


The remainder of the publication assumes that a survey is the most 
appropriate method for gathering the required information and discusses 
a range of issues associated with the collection and analysis of survey 
data. 
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CHAPTER 2 


INTRODUCTION 


FACTORS INFLUENCING 
CHOICE OF METHOD 


Nature of the questions 


Response rates 


Resources 


Time 


Sampling frame 


DATA COLLECTION METHODS 


There are a number of methods available for collecting data and the 
choice between these depends on a number of factors. It should be 
stressed at the outset that the success of the survey will depend to a 
large extent on the suitability and appropriateness of the collection 
method chosen. 


The nature of the questions, and in particular the depth and complexity 
of the topics to be covered, will in many cases dictate the collection 
method to be employed. Similarly the quality of responses sought may 
determine the choice of an appropriate collection method, e.g. it is 
difficult to obtain detailed answers to complex questions by telephone or 
mail survey, whereas personal face-to-face interviews will generally yield a 
greater depth of response. 


The quality and reliability of survey data can be affected by the degree of 
response to a survey. Although it is rare to achieve a 100% response rate 
for any survey, choice of collection method can influence the response 
rate obtained. For example, telephone interviews usually achieve a far 
better response rate than mail questionnaires. 


Where staff and/or financial resources are limited the researcher may be 
constrained to use, for example, mail-out techniques for the collection 
phase of the survey because of the lower cost. Often this will conflict 
with the quality requirements of the survey. In these circumstances the 
researcher must try to achieve an acceptable compromise, or seek 
resources or cost savings elsewhere. 


As with resources the time constraints on the survey may dictate the 
choice of methodology. Telephone surveys (particularly using 
computer-assisted telephone interviewing) are much quicker than 
mail-out surveys or personal interviews. However, savings in time often 
necessitate sacrifices in the complexity or sensitivity of the questions 
asked, and the depth of responses received. 


The type and quality of the sampling frame (the list of ‘members’ from 
which the sample is to be selected) may influence the choice of 
collection method, e.g. to conduct a mail survey it is necessary to have a 
list of the names and addresses of all elements in the sampling frame. If 
this is unavailable there may be no option but to use personal interviews 
and an area based frame for the survey. 
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OPTIONS AVAILABLE 


Personal interviewing 


Self-enumeration 


The commonly used collection methods can be divided into two basic 
types: personal interview and self-enumeration. These in turn can be 
further divided, and some methods utilise a combination of the elements. 
In choosing a collection method for a survey, the advantages and 
disadvantages of each type of method should be assessed in the light of 
the influencing factors discussed above. 


There are two types of personal interviewing: face-to-face and telephone. 


Face-to-face 

As the name suggests this method involves having an interviewer visit 
each ‘member’ selected from the sampling frame for the survey. This 
form of data collection is highly effective in terms of establishing rapport, 
boosting response rates and data quality, and collecting sensitive or 
complex data. However, the disadvantages of personal interviews are the 
costs (in staff, time, and money required to obtain, train, and manage an 
interviewer workforce), the possibility of bias being introduced by 
interviewers, the cost of supervision, and the cost of ‘call backs’ when 
respondents are unavailable. 


Telephone 

Telephone interviewing has a number of advantages over face-to-face 
interviewing; costs are usually lower because fewer staff are required, 
interview times are generally shorter and there are no travel costs; 
supervision may be centralised; and ‘call backs’ and follow-up are quick 
and inexpensive. 


The disadvantages of telephone surveys include the difficulty of 
establishing rapport with respondents which can lead to lowered 
response rates. Other disadvantages include the ease with which the 
respondent can terminate the interview, thus leading to problems of 
partial response; the need for questionnaires to be brief and simple to 
avoid boredom or fatigue on the part of the respondent; and the obvious 
limitation that only persons with telephones can be surveyed. The 
exclusion of people without telephones may introduce a slight bias into 
the survey, e.g. if the researcher is investigating topics such as income 
distribution, socioeconomic groupings or employment status. There is 
also the issue of respondents screening calls using answering machines 
which may reduce response rates and bias results. Another drawback of 
telephone surveys is the need for a frame of applicable telephone 
numbers. For example, not all private numbers are listed. This can be 
overcome by the use of random digit dialling (RDD) however this can be 
inefficient and can result in a large number of non contact and out of 
scope calls. 


Self-enumeration surveys are those in which it is left to the respondents 
to complete the survey questionnaires. Although these are primarily 
postal, or mail-out surveys, they can also include hand-delivered 
questionnaires. 
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Computer-assisted 
interviewing 


Postal surveys 

In many situations postal surveys can provide an effective and efficient 
method of data collection, particularly where information is to be 
collected regularly or over a long period and is generally available from 
respondents’ records. Postal surveys are a relatively inexpensive method 
of collecting data, and it is possible to distribute large numbers of 
questionnaires in a very short time. Other advantages of postal surveys 
include the ability to cover a wide geographic area; the opportunity to 
reach people who are otherwise difficult to contact, such as people away 
from home or out on business; and the convenience that it affords 
respondents to complete the questionnaires in their own time. 


The major disadvantage of postal surveys is that they usually have lower 
response rates, leading to potential problems with data quality and 
reliability. Other disadvantages include the need for questionnaires to be 
kept simple and straightforward to avoid confusion or errors; the 
difficulties faced by respondents with only limited ability to read or write 
in English; and the time taken to answer correspondence or resolve 
queries by mail. 


For surveys including businesses, a particular problem with a post-based 
approach is the need to ensure the questionnaire is received by the 
appropriate person within the business. Failure to ensure that the right 
contact within the business receives the questionnaire can result in both 
low response rates and poor quality information. 


Hand-delivered questionnaires 

An alternative form of the self-enumerated survey is where questionnaires 
are delivered to, and/or collected from, the respondents personally by an 
‘interviewer’ or collector. This method usually results in improved 
response rates (compared with a postal survey) and is particularly 
suitable where information needs to be collected from several members 
of a household, some of whom may be unavailable when an interviewer 
calls. 


Disadvantages of this methodology include the cost, the need for the 
questionnaire still to be relatively straightforward and the difficulty of 
achieving a sufficient level or quality of response. 


Computer-assisted interviewing (CAI) is a technique applying modern 
computer technology to telephone and, sometimes, personal interviewing. 
It involves the use of a computer to collect, store, manipulate and 
transmit data relating to interviews conducted between the interviewer 
and respondents. Computer-assisted personal interviewing is one type of 
CAI and involves the conduct of face-to-face (household or business) 
interviews between the interviewer and respondent. Computer assisted 
telephone interview is another type of CAI and involves the conduct of 
interviews via the telephone. 
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Computer-assisted Advantages of computer-assisted interviewing 
interviewing continued Timeliness. CAI speeds up the whole survey process by integrating data 
collection, data entry and data editing and by allowing data to pass 
directly from data collection to analysis, therefore enabling users to 
receive analysed data more quickly. 


Improved data quality. With CAI, sequencing of the questionnaire is 
automatic, thus avoiding any such errors. Furthermore, errors can be 
automatically identified during interview, and inconsistencies in the 
respondent’s answers can be more readily queried, thereby allowing 
corrections to be made to answers during the interview stage. 


Flexibility. CAI questionnaires can be altered and added to relatively 
quickly, allowing faster response to users’ needs. CAI also has the 
potential to hold more than one survey within the one workload, and 
automatically generate the appropriate survey to be conducted at the 
household or business, without specific instructions being required of the 
interviewer. 


A particular strength of CAI is the capacity to handle complex surveys 
and, for repeated surveys, to offset high development costs through 
repeated application of the survey. 


Disadvantages of computer-assisted interviewing 

These are the high start-up and maintenance costs for equipment, 
software, site preparation etc., and the need for interviewers with 
computer or typing skills involving consequent training overheads. 


The CAI instrument needs to be completely specified and coded before 
survey enumeration commences. This includes instructions that would be 
common sense to an interviewer working with a printed questionnaire. 
For example, interuptions to interviews must be planned for and the 
capacity to move back through the questionnaire to revisit a question 
must be provided. 


SUMMARY The choice of a method of data collection may depend on any or all of 
the factors discussed above. The researcher needs to have a clear idea of 
the aims and objectives of the survey, the data to be obtained and the 
degree of accuracy required. It is also necessary to estimate available 
resources in advance, in terms of staff, time, and money. Only then is it 
possible to evaluate realistically the alternative methods and determine 
which is the most appropriate for a particular researcher’s requirements. 
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CHAPTER 3 


INTRODUCTION 


GENERAL CONSIDERATIONS 


QUESTIONNAIRE DESIGN 


An integral part of any sample survey is the questionnaire through which 
information is to be gathered. The design of the questionnaire can 
influence the response rate achieved by the survey, the quality of 
responses gained, and the reliability of conclusions drawn from the 
survey results. 


The central aim of a questionnaire is to collect accurate and relevant 
data. In order to achieve this, the questionnaire should: 


=» enable respondents to complete it accurately within a reasonable time; 
=» be properly administered by the interviewers; 

=» use language that is readily understood by respondents; 

= appear uncluttered on the form/screen; and 

= be easily processed by both people and machines. 


Since the first four considerations may conflict with the fifth, it is 
important to utilise a well-designed form to reduce this conflict. 


To facilitate respondents’ completion of the form, a researcher first needs 
to ascertain whether the information sought is readily available from the 
respondents. Next, the questions should be so designed as to prevent 
confusion arising in the mind of the respondent. This can be achieved in 
a number of ways by: 


= maintaining a logical order in the sequencing of questions 
(see diagram on next page); 


=» minimising and simplifying instructions and explanatory notes; 


= providing clear instructions or explanations before rather than after 
directing respondents to ‘jump’ to a new question; 


=» making any sequencing instructions very obvious; 


= providing for all possible response variations, including not applicable, 
zero, and non-response; 


= avoiding ‘leading’ questions which assume a certain response to a 
question not explicitly asked (e.g. ‘Which cinemas do you attend?’ 
assumes the respondent goes to cinemas); 


=» making questions simple—complicated or double-barrelled questions 
increase the likelihood of errors and non-response as well as making 
responses difficult to intrepret; and 
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GENERAL CONSIDERATIONS 


continued 


QUESTION TYPES 


= trying to reduce memory bias. Respondents tend to remember what 
should have been done rather than what was done, and they tend to 
include in a reference period events that occurred outside the 
reference period. Where possible, framing questions that relate to 
respondents’ own record keeping can enhance accurate reporting. 
Minimising the recall period also helps to reduce memory bias. For 
example, recalling events that happened ‘last month’ is easier than for 
‘the same month one year ago’. 


It is also important to consider how the answers on the form will be 
processed to produce statistics. To facilitate accurate processing of the 
survey data, space should be provided on the form for coding answers. If 
the data are to be entered into a computer system directly from the 
questionnaire, the codes for each type of answer should be displayed 
clearly on the form. See chapter 9 for a discussion of Optical Mark 
Recognition and Intelligent Character Recognition. 


Testing of questionnaires should be conducted during the questionnaire 
development stage. It is essential that questionnaire testing be 
implemented for all new surveys and for already existing surveys on 
which substantial modifications have been made, in order to determine 
whether the objectives are likely to be met by the proposed 
questionnaire. See chapter 6 for detailed information. 


Questions may generally be classified as one of two types—open or 
closed—according to the degree of freedom allowed in answering the 
question. In choosing question types, consideration should be given to 
factors such as the kind of information which is sought, ease of 
processing, and the availability of time, money, and personnel. 
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BIRTHPLACES/LANGUAGES: Q. 14-18 











All household members over 15 years 
































V 
In which country were you 
a4 born? 
Vv 
All others Australia 









































In what year did you arrive in 
oe Australia? 
Q. 16 Did you have any family in Australia just 


before you came to this country? 







































































Q.17 Do you speak a language other 
; than English at home? 
Yes No 
Q. 18 What other languages do you 
, speak at home? 
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Open questions Open questions allow respondents to answer the questions in their own 
words. The answer box or area should allow sufficient space for a high 
percentage of the likely answers, for example: 


What is the main kind of activity carried 
out by this business? 








How many employees does this | 
Iusmesshavel.. ©... 4 
It is often a good idea to provide some directions or examples on how 


to answer an open question and to make it clear how the respondent 
should answer if a 'not applicable’, 'zero', or 'non-response' applies. 


How many employees does this business have? 


Note 
= Write “NIL” if there are 


The questions need to be understood by all respondents in the same 





way. Even an apparently simple question can be understood in different 
ways, for example, a question 'How much diesel fuel did this business 
use in the year 1998—99?' may be answered in several ways such as in 
'litres', 'gallons' or as a ‘dollar value’. A more precise way of asking for 
this information, where quantity could be derived from value, is shown 
below: 





What was the amount spent on 
diesel fuel during 1998-99? ... 


The advantages of open questions are that they allow many possible 


answers and they can collect exact values from a wide range of possible 
responses. The open questions are often used in initial pilot testing to 
determine the range of possible answers and the availability of the data 
being sought. 


The disadvantages are that they are more demanding than closed 
questions both to answer and to process. In particular, processing 
problems arise from the need to create a coding frame to interpret a 
variety of responses, and from the difficulty of reading poor handwriting. 
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Closed questions 


Closed questions can be of several types, reflecting the style of response 
permitted. They are generally cheaper, easier to answer, and easier to 
process than open questions. Closed questions are appropriate when the 
researcher can anticipate most of the responses and when exact values 
are not needed. The disadvantage of closed questions is that they require 
significantly more effort than open questions in the development and 
testing stages. 


Limited choice questions are those which require a respondent to choose 
one of two mutually exclusive answers (e.g. Yes/No). 





Did you buy a car in the period of 1 July 1998 to 
30 June 1999? 


No LJ 
Yes L] 


Multiple choice questions require a respondent to choose one of a 
number of responses provided. 


How many litres of diesel fuel did this business 


use in year 1998? Tick one only 


fess than 100 litres = CL] 
Wt 500 thes = tsi‘ ‘dl éC*di 
Cer c0G-lee {i 





Checklist questions allow a respondent to choose more than one of the 
responses provided. 


Have you ever heard of or are you aware of: 


Tick all the boxes applied 


Neighbourhood houses la 
Drop-in centres -_|l 
Senior citizens tute... «ss«—s—«i«@w*sws«“‘<(‘é eid 
None ofthe above == Ld 





Partially closed questions provide a set of responses where the last 
alternative is ‘Other, Please specify’. Partially closed questions are useful 
when it is difficult or impractical to list all possible choices. 
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What is the legal status of this business? 


Tick one only 


Sole propnetor. =. = CCti<‘LL 
Patnersshio ss dL 
Incorporated company .O 3 
Other (Please specify) L] 04 


Attitudinal questions generally seek to locate a respondent’s opinion on 
a rating scale with a limited number of points (usually five). For 
example, respondents may be asked the level of satisfaction with a given 
statement. 





What is your level of satisfaction with your skills 
in using Lotus 123? 


Very Very 
high High Fair Low low 


L L L L L] 





Alternatively respondents may be asked to rate, for example, the 
cleanliness of their bus line according to the following scale: 


- - 
0 





In other cases, respondents may be asked to rate, for example, their train 
service on waiting time by recording their responses on the following 
rating scale: 


good bad 





Special care should be taken when designing attitudinal questions 
because: 


= they are interpreted subjectively and this interpretation can differ 
between respondents; 


= respondents may have difficulty interpreting the scale correctly; 


» if there are a large number of similar questions, respondents are likely 
to answer in a hurried or careless fashion; and 


= expressions of attitude can differ markedly from actual behaviour. 
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Choosing between question 
types 


SEQUENCING 


Logical structure 


The choice between question types depends upon a number of factors 
including: 

= the researcher’s data requirements; 

= the level of accuracy needed; 


s the sort of information which is potentially available from 
respondents; 


= the processing system to be used to code and analyse the survey 
results; 


m the experience of respondents (i.e. whether the survey is to be 
conducted regularly or once only); 


= the position of questions on the form; and 


= the sensitivity of the question—(closed questions generally elicit more 
positive responses to sensitive topics than open questions). 


Once the questions have been chosen they should be tested to determine 
whether the best choice has been made. See chapter 6 for survey testing 
details. 


The sequence of questions should be designed so as to: 


encourage respondents to complete the questionnaire and to maintain 
their interest in it; 


= facilitate respondents’ recall; 

= direct respondents to the information source; 

= be relevant to respondents’ own record keeping, if any; 
= appear sensible to respondents; and 

= focus on the issue under consideration. 


The questions on a form should follow a sequence that is logical to the 
respondents. Regardless of the method used to administer the 
questionnaire, the sequence should flow smoothly from one question to 
the next. A smooth progression through the questions is particularly 
important if the questionnaire is answered in unfavourable conditions 
(e.g. in a dim room, or outdoors in wind or rain). 


It is a good idea to start the questionnaire with simple, straightforward 
questions which both promote interest in the survey and establish 
respondents’ confidence in their ability to answer the remaining 
questions. In particular, the opening questions should establish (if 
necessary) that the respondent is a member of the survey target 
population. 
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Logical structure continued The remaining questions should be logically structured so that the 
interviewer or respondent does not need to alternate between pages of 
the questionnaire. Questions on related topics should be grouped 
together and all questions on a particular topic should be asked before 
proceeding to another topic. Care should be taken to use a logic or 
grouping that reflects the understanding of the respondents targeted for 
the survey. 


Sensitive questions Questions which may be sensitive to respondents should generally not be 
placed at the beginning of a questionnaire. Rather, they should be placed 
in a section of the form where they are most meaningful in the context 
of relevant questions. In this way the format of the questionnaire can act 
as a buffer to help the respondent feel more comfortable with sensitive 
questions. For example, the item ‘owner’s drawings’ from an 
unincorporated business might be sensitive, but putting it in the context 
of an income statement (comprising net sales, cost of sales, gross margin, 
other expenses, operating income, owners’ drawings, and net income) 
may make it less sensitive. Also, placing sensitive questions last minimises 
the impact of a possible refusal. 


Filtering Filtering can be used to ensure that respondents answer only those parts 
of the questionnaire that are relevant. This is achieved by the use of filter 
questions which direct respondents to skip questions that do not apply 
to them. Filter questions also help respondents to understand the 
sequence of questions and they are simpler to follow than conditional 
questions. An example of a filter question is: 


1 Does this business have a parent company? 


No L Goto3 
Yes LJ 


2 What is the name of the parent company? 


pod 


This filter question is preferable to the following conditional question: 








1 If this business has a parent company, 
what is its name? 


po 


With conditional questions it is not clear whether a blank answer 


represents a non-response or a ‘not applicable’ response. 


In general, filter questions should place the ‘No’ response before the 
‘Yes’ response because it is usually the ‘No’ response which directs 
respondents to a subsequent question. Respondents then do not have to 
read the ‘Yes’ response. 
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LANGUAGE 


Filtering continued 


Filter questions are useful in situations where respondents are being 


asked for attitudinal information. They are used to ensure that the 


respondent actually has an attitude or a view about a particular topic or 


subject before asking for that view. For example, a respondent may be 


asked ‘Do you have views about <topic>’ before being asked those 


views in order to ensure that people who have no real interest in a topic 


do not influence the outcome of that part of a survey. 


Careful wording of questions is essential to ensure that respondents 


understand questions correctly and do not misinterpret them. When 


framing the wording of questions a number of techniques can be used to 


facilitate correct and quick understanding: 


Use short sentences which convey a single item of information rather 
than long sentences. 


Arrange clauses in chronological order within each sentence. For 
example, the sentence ‘Read the instructions then fill in the form’ is 
more quickly understood than ‘Before you fill in the form, read the 
instructions’. 


Ask positive questions rather than negative ones. For example, ‘Are 
you: married, single, etc? Tick one.’ is easier to understand and can 
be answered more quickly than ‘Place an X in the box next to those 
items which do not apply’. 


Use sentences in the active voice rather than the passive voice. Active 
voice is when the subject performs the action, e.g. ‘The operator is to 
complete this form.’ Passive voice is when the subject is acted upon, 
e.g. ‘This return is to be completed by the operator’. 


Avoid making nouns out of verbs. For example, ‘The Managing 
Director must certify this document’ is clearer and more direct than 
‘Certification of this document must be done by the Managing 
Director’. (In this example, making a noun out of a verb also requires 
passive voice to be used.) 


Use a conversational style rather than omitting helpful phrases for the 
sake of brevity. An example of conversational style is ‘Please comment 
on any unusual events which affected your agricultural activity this 
year. Some examples are drought, flood, fires, and hailstorms'. This 
example is clearer than ‘Comment here on unusual circumstances 
(drought, flood, fires, hailstorms, etc.)'. 


Use words of one or two syllables rather than longer words unless the 
longer words are very familiar to most people. 


Provide some context for words whose meaning can change in 
different circumstances. It is important to describe briefly the purpose 
of the survey and how the statistics will be used. 


Avoid using technical and statistical terms or, if they must be used, 
explain them in plain English. 
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LANGUAGE continued 


PHYSICAL DESIGN 


Construction 


Graphics 


=» Use simple punctuation such as commas, full stops, and question 
marks. If a semi-colon or a succession of commas is needed, the 
sentence should be broken into at least two shorter sentences. 


In general, the wording of questions should be as direct as possible and 
should avoid being ambiguous, too general, or using vague words such 
as ‘occasionally’, ‘often’, etc. The meaning a researcher attaches to a 
word may not be the meaning respondents attach to it. Things which 
appear clear to people in the know are often not so clear to the general 
population. The best way to find out is to test the questions with a 
group of respondents. If necessary, the questions can then be re-worded 
and re-tested until the respondents’ understanding of the questions 
matches the researcher’s understanding. 


The questionnaire should be physically set out so as to minimise the 
time needed to interview, respond, and process the results. Specifically, 
consideration should be given to the form’s construction, graphics, and 
layout. More detailed guidelines are documented in The Form Designer’s 
Quick Reference Guide by Robert Barnett (second edition 1994.) 
Numerous technical papers and books on questionnaire design are also 
available from libraries. 


The number of pages should be as many as are needed for a clear 
layout. A small, short form may be cramped, difficult to read and 
complete and may compromise the results of the survey. It is also 
important for the paper to be sufficiently opaque so that writing and 
printing on one side of the paper do not show through to the other 
side. If a booklet is used, the staples should be on the spine. (Printers 
refer to this as saddle-stitching.) 


The speed and accuracy of responses can be enhanced by using the 
following graphics guidelines which cover the areas of typography, 
colour, and ruled lines: 


Typography 
= Line length should be no more than can be read in two or three eye 
fixations (about 115 mm)—an advantage for poor readers. 


=» Upper case text is difficult to read, and should be avoided where 
possible. 


=» Avoid ornate and decorative typefaces. Serif type fonts (e.g. Times) are 
easier to read for questions than sans serif types (e.g. Helvetica). 


=» The top line of questions should overhang the bottom line to prevent 
poor readers from skipping to the end of the bottom line without 
reading the whole question. 


= Left align text where possible. 


» Leading is another point to consider and refers to the amount of 
space between lines on the form. 
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Layout 


Colour 

s The background colour of the form should contrast sufficiently with 
the text to facilitate reading and office processing. For example, black 
text should be used with white or orange backgrounds. 


Ruled lines 

=» These are used for dividing columns, sections, and questions, and for 
defining screen boundaries, answer box boundaries, and guidelines for 
writing in answer boxes. 


=» Use the minimum number of lines consistent with what is necessary. 
= Lines should be as thin as possible to do the job. 


» If using Yes/No tick boxes, be consistent in whether ‘Yes’ or ‘No’ 
appears first. It is generally preferable for ‘Yes’ to appear first 
although when using filter questions, the ‘No’ response will usually 
appear first. 


=» The aim should be to unclutter the page by removing all unnecessary 
ink. 


Two basic principles should be followed when designing the layout of a 
form. Firstly, the graphics standards (discussed in this chapter) should be 
applied consistently throughout the form. Secondly, the sequence of 
material presented in the form should match the sequence that 
respondents are expected to follow when filling out the form. Any notes 
to questions should appear with the relevant questions so that 
respondents do not have to alternate between different parts of the form. 
Enabling respondents to progress through the form one step at a time 
reduces the likelihood of errors. 


Page margins should be 5 mm for the top, bottom and two side margins. 
The layout within these boundaries can be either full-page 
(single-column) format or split-page (double-column) format. These two 
formats should not be mixed on the one page. If instructions or 
explanations are to be incorporated with the questions, full-page format 
is preferable as there is sufficient room to allow for this. If a large 
number of short questions and answers are being used, split-page format 
offers a better use of space. Split-page format also has the advantages of 
being easier to read (owing to shorter lines) and of providing a clearer, 
linear progression for respondents to follow. Split-page format should 
not be used where a matrix is included covering the full width of the 


page. 
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SUMMARY Questionnaire design begins by determining the data which are to be 
produced by the survey to meet desired aims and objectives and devising 
a list of questions to obtain these data. Careful consideration should be 
given to a number of factors including the types of questions to be used, 
the logical sequence and wording of questions, and the physical design 
of the form. It is important to test each of these aspects of questionnaire 
design with a group of respondents before finalising the questionnaire. If 
necessary, the form can then be modified and re-tested until respondents 
can complete it accurately and quickly with a minimum of errors. See 
chapter 6 for details of survey testing. 
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CHAPTER 4 


INTRODUCTION 


SAMPLING FRAME 


SAMPLE DESIGN 


Sample design covers the areas of sampling frame, sample size, and 
sampling methodology. Aspects to be considered within these areas 
include: 


=m accuracy required 


= cost 
a timing 
= strata 


This chapter describes some alternative sample designs and how to 
choose the most appropriate one for a particular researcher’s 
requirements. It should be noted that the descriptions given are outlines 
and additional assistance from a statistician is recommended. In 
particular, efficient sample design can introduce complexities that are 
best dealt with by an expert survey methodologist. 


A sampling frame is a list of all members (e.g. persons, households, 
businesses, schools) of the target population for the survey. For example, 
a sampling frame may be the electoral roll, the membership list of a 
club, or a register of schools. Alternatively, the frame may cover one 
stage of a multistage sample. For instance, the frame may be a list of 
schools from which students will be surveyed. In some cases, researchers 
may need to construct their own frames. 


For most sampling methodologies it is important to have a complete list 
from which to select a sample otherwise the sample may not accurately 
represent the target population. In practice, however, it can be difficult 
to compile a complete and reliable list of all population members. Any 
known deficiencies in the coverage of the sampling frame should be 
stated when the survey results are documented. Flaws in the sampling 
frame can include omissions, duplications, and incorrect entries. 
Omissions are very common and can be particularly serious as the 
omitted members may have a common characteristic. If there are too 
many flaws in the frame then the survey results should be used to 
generalise only about those types of population members which are 
included in the sampling frame. 


Each member of the sampling frame should have a known non-zero 
probability of being selected in the sample. If a suitable sampling frame 
does not exist and cannot readily be constructed by the researcher then 
an alternative method of collecting data should be considered (see 
chapter 2). 
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SAMPLE SIZE 


Resources and accuracy 


Choosing sample size for a survey involves considering factors such as: 


= the resources (of time, money, and personnel) available to conduct 
the survey; and 


= the level of accuracy required for the results. 


These are distinct but related quantities: 


a the amount of detail needed in the results; 
= the proportion of the population with the attributes being measured; 


=» whether members of the target population differ greatly from one 
another on those attributes (i.e. the variability of the attributes being 
measured); 


m the expected levels of non-response; and 


the sample design used. 


Estimates that are based on information from a sample of units in a 
population are subject to sampling variability. That is, they may differ 
from the figures that would have been obtained had the entire 
population been surveyed. A large sample is more likely than a small 
sample to produce results that closely resemble those that would be 
obtained if a census was conducted. This difference between survey 
results, or estimates, and census results can be measured by the standard 
error. 


When planning a sample survey, a researcher may wish to minimise the 
size of the standard error in order to maximise the accuracy of the 
survey results. In this event, the sample size can be as large as resources 
permit. Alternatively, the researcher may wish to specify in advance the 
size of the standard error to be achieved in order to minimise the costs 
of the survey. In this case, the sample size is chosen to produce the 
specified size of standard error. 


The standard error is used to construct a confidence interval which is 
expected to include the ‘true value'. A 68% confidence interval is 
equivalent to the survey estimate plus or minus one standard error of the 
estimate. A 95% confidence interval is equivalent to the survey estimate 
plus or minus two times the standard error of the estimate. A 99% 
confidence interval is the survey estimate plus or minus three times the 
standard error. In practice, this means that in 99 out of 100 samples the 
confidence interval is expected to contain the ‘true value’ for the target 
population. 


For example, if a survey finds that 60% of respondents are in favour of a 
proposal and the standard error of the estimate was 4%, we can be 

95% confident that the ‘true value’ lies between 52% and 68%. If the 
researcher wants to be 95% confident that the ‘true value’ lies between 
56% and 64%, a standard error of 2% is required. A reduction in the 
standard error reduces the range of the confidence interval, but requires 
a corresponding increase in the sample size. 
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Determining sample sizes for 
a simple random sample 


To demonstrate how accuracy requirements are used in determining 
sample size, the following example is provided. 


This example assumes a simple random sample is to be selected from an 
infinite population. If more complex sampling techniques, such as 
clustering or stratification are used, alternative formula and calculations 
are required. The proportion p being estimated by the survey needs to 
be roughly known in advance from supplementary information. However, 
the formula used in the example will give reasonable results for most 
large population sizes encountered. 


A researcher wishes to measure the proportion of dwellings with 
electrical safety switches installed. The government safety authority 
believes that the proportion is approximately 40% and needs to know 
how many dwellings should be sampled to obtain an estimate with a 
95% confidence limit. 


The researcher could interpret the requirement of the safety authority as: 


p= 0.4 
CI level = 95% 
CI range = 0.35-0.45 


where 
p stands for proportion 
CI stands for confidence interval 


The standard error required to achieve this can be calculated in the 
following manner: 


SE = p-CI range (lower value) / 2 
0.4-0.35 / 2 
0.025 


If a survey is designed to measure simple proportions, without any 
requirement for complex classifications of estimates (eg. cross classifications), 
the following formula can be used to determine the size of the sample: 


n = pq/ SE? 

where 

n = sample size 

p= sample proportion 

a) = 1-p 

SE = required standard error of the sample proportion 


The sample size required for this example is calculated as follows: 


n = 0.4 xX 0.6 / 0.025 x 0.025 
= 384 
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SAMPLE SIZES 


If this survey was then completed with a sample size of nm = 384 and it 
was found that the sample proportion p was 0.3 (not 0.4 as believed), 
then the standard error of this sample proportion of 0.3 would be 0.023, 
ie. 2.3% (using the above formula). The 95% confidence interval for the 
sample proportion (i.e. 0.3) would be from 0.254 to 0.346. 


The following table provides some examples of sample sizes required for 
certain standard error sizes and sample proportions. 


The table assumes that the samples are being randomly drawn from an 
infinite population. 





Standard error of sample proportion (%) 








Sample 

proportion 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 
% no. no. no. no. no. no. no. no. no. no. 
10 3 600 900 400 225 144 100 73 56 44 36 
20 6 400 1 600 711 400 256 178 131 100 79 64 
30 8 400 2 100 933 525 336 233 171 131 104 84 
40 9 600 2 400 1 067 600 384 267 196 150 119 96 
50 10 000 2 500 41111 625 400 278 204 156 123 100 
60 9 600 2 400 1 067 600 384 267 196 150 119 96 
70 8 400 2 100 933 525 336 233 174. 131 104 84 
80 6 400 1 600 711 400 256 178 130 100 79 64 
90 3 600 900 400 225 144 100 73 56 44 36 





Use of relative standard 
error in the determination of 
sample size 


Relative standard error is defined as: 


standard error of the proportion 





proportion 


Relative standard error is usually preferred in the determination of 
sample size because of its greater objectivity of application across 
different values of the sample proportion p. 
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Non-response 


SAMPLING METHODOLOGY 


Simple random sampling 


Cross-classification 

If the survey seeks to produce detailed results that include 
cross-classifications, it is important that the sample size of each sub-group 
be large enough to produce reliable estimates (i.e. with low standard 
error) for the sub-group. A useful approach is to draw up a blank table 
showing all the characteristics to be cross-classified. The more cells there 
are in the table, the larger will be the sample size needed to produce 
reliable estimates. This larger sample size, in turn, will require more 
resources to conduct the survey. 


The number of cells in a table is determined by both the number of 
characteristics to be cross-classified and the number of categories for each 
characteristic. For example, a table cross-classifying 10 age categories by 

2 sex categories by 10 birthplace categories will have 200 cells, a table 
cross-classifying 5 age categories by 2 sex categories by 2 birthplace 
categories will have 20 cells, and a table cross-classifying 10 age 
categories by 2 sex categories will have 20 cells. 


Sample size should be increased to compensate for expected levels of 
non-response. However, the characteristics of non-respondents may differ 
markedly from those of respondents. The survey results could therefore 
be misleading even if a sufficient number of responses are obtained to 
produce low standard errors. The higher the non-response rate, the more 
accentuated this effect will be because the sample represents less of the 
target population. Selecting larger sample sizes to achieve target response 
levels may not be an appropriate means of compensating for high non 
response as those responding may still be unrepresentative of the target 
population. The first aim should be to minimise non response. 


A number of alternative methodologies can to be used to select a sample 
for a survey. The choice between these methodologies depends on 
considerations such as the nature of the target population, the nature of 
any supplementary information that can be obtained, the levels of 
accuracy desired, the availability of sampling frames, personnel, 
processing facilities, funds, and the time available to complete the survey. 
Each of these factors can influence the accuracy of the survey estimates 
for a given sample size. The reverse is also true-for a given level of 
accuracy these factors can affect the sample size required. 


With simple random sampling, each member of the sampling frame has 
an equal chance of selection and each possible sample of a given size has 
an equal chance of being selected. Every member of the sampling frame 
is numbered sequentially and a random selection process is applied to 
the numbers. The random selection process may involve, for example, 
using a table of random numbers or randomly selecting numbered balls. 


The advantage of simple random sampling is that it is simple and easy to 
apply when small samples are involved. The disadvantages are that it 
requires a complete list of members of the target population and it is 
very cumbersome to use for large samples. 
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Systematic sampling Systematic sampling uses a fixed interval to select members from a 
sampling frame. For example, every twentieth member may be chosen 
from the frame. The size of the interval J is calculated by dividing the size 
of the target population N by the size of the sample required n, as follows: 


I=N 


n 


The members of the frame must first be numbered sequentially. A random 
number is then chosen between one and the size of the sampling interval 
I. The member corresponding to that number is selected in the sample 
together with every following Jth member on the list. (Note: If J is not a 
whole number, then round it to the nearest whole number.) 


For example, a systematic sample of 300 students from a registration list 
of 6,000 would require a sampling interval of 6,000/300=20. The starting 
point would be chosen by selecting a random number between 1 and 20 
from a table of random numbers. If this number was, say, 16, the 
sixteenth student on the list would be selected in addition to every 
following twentieth student. The sample of students would be those 
corresponding to the registration numbers 16; 36; 56; 76;.....; 5,936; 
5,956; 5,976; and 5,996. 


The advantage of systematic sampling is that it is simpler and easier to 
select one random number and then every /th (e.g. twentieth) member on 
the list than to select as many random numbers as the size of the sample 
(e.g. 300). It also gives a good spread right across the population if the list 
is ordered in a useful way. For example, school students can be ordered 
by year of study and a systematic sample will yield a good spread of 
selections across the years of study as well as increase the accuracy (reduce 
the standard error) of the survey estimates. The disadvantage is that 
additional variability can be introduced if the list is ordered in a non 
useful way. In general the list will be at worst random, but extreme cases 
can arise if poor ordering exists. For example, a frame which lists men and 
women on alternate lines would produce a sample of all men or all 
women if the selection interval were an even number. For this reason it is 
always a good idea to check the ordering of your population to see if 
systematic sampling is appropriate. 


Although a complete list of the target population is required for 
systematic sampling, the list does not necessarily have to be in written 
form. Proxy lists can be used such as individual case records in a file. 


Stratified sampling If supplementary information is available concerning the composition of 
the target population, it may be more efficient to use this information to 
divide the population into groups, or strata. Either simple random 
sampling or systematic sampling techniques are then applied to the strata 
rather than to the population as a whole. The strata should be as 
different from each other as possible while members within each group 
should be as like each other as possible. Some examples of strata 
commonly used by the Australian Bureau of Statistics are States, industry 
size, age and sex. 
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Stratified sampling continued 


When planning a stratified sample, a number of practical considerations 
should be kept in mind: 


= ‘The strata should be designed so that they collectively include all 
members of the target population. 


=» Each member must appear in only one stratum. 


=» The definitions or boundaries of the strata should be precise and 
unambiguous. 


The five main benefits of stratified sampling are: 


=» The representation of different groups within the sample can reflect 
the proportions that occur in the target population (e.g. 60% men, 
40% women). 


= Minority groups can be ‘oversampled’. Greater probabilities of selection 
can be applied to minority groups than to the majority group. This is 
useful if the survey is focusing more on the minority groups than on the 
majority group. However, oversampling introduces complexity into the 
estimation stage of the survey because it must be taken into account in 
the weights used at estimation. For example, if three times as many of 
men are selected as women, the weight allocated to men should be one 
third of that allocated to women when calculating population estimates. 


=» The results are more accurate. Sampling error is reduced because of 
the grouping of similar units. 


» Different selection or interviewing procedures can be applied to the 
various strata. This is useful if the strata differ greatly in geography, 
topography, customs, language, the availability of maps, materials or 
funds, the convenience of administration or their cost parameter. 


= Separate information can be obtained about the various strata. 
Stratification permits separate analyses on each group and allows different 
characteristics to be analysed for different groups. Stratification also 
enables control of an adequate sample in each group. However, analysis 
across strata is possible even if stratification occurs, thus it is not 
necessary to have each group that is to be analysed in a separate 
situation. 


Stratification is most useful when the stratifying variables are simple to 
work with, easy to observe, and closely related to the topic of the survey. 
However, elaborate stratification should be avoided as difficulties with 
analysing the results can increase as the number of stratifying variables 
increases. For example, the sample sizes of the strata need to be large 
enough to support analysis where desired, so that as the number of 
stratifying variables increases, the sample size can also increase. 


If a stratifying variable is difficult to observe at the sampling stage 

(e.g. PC ownership, in a household survey) it should be applied at the 
analysis stage instead. This procedure is known as post-stratification. 
However, in order to apply post-stratification, it is necessary to know the 
distribution of the post-strata in the target population. 
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Cluster sampling Cluster sampling involves selecting members of the target population in 
groups, or clusters, rather than individually. Each member within a 
selected cluster is included in the sample. Examples of clusters are 
factories, schools, and geographic areas such as electoral subdivisions. 


The advantages of cluster sampling are that costs are reduced, field work 
is simplified, and administration is more convenient than with 
non-clustered designs. Cluster sampling is particularly suitable for surveys 
aimed at regional, State, national, or even international coverage. Instead 
of the sample being scattered over the entire coverage area, the sample is 
localised in relatively few ‘centres’ (i.e. the clusters). If the survey 
involves face-to-face interviews, cluster sampling facilitates the recruitment 
and teaching of locally based interviewers and reduces travel time and 
costs. Cluster sampling also facilitates the administration of field work 
and the supervision of interviewing. The lighter workload and simplified 
administrative procedures often result in lower costs for expenses such as 
salaries, office supplies, postage, and telephone calls. 


The main disadvantage of cluster sampling is higher sampling error (and 
therefore less accurate results) than for a simple random sample with the 
same sample size. This is because members within a cluster tend to be 
similar while differences between clusters can be large. The extent of the 
increased sampling error depends on how representative the clustered 
sample members are of the target population. In practice, cluster samples 
often need to be larger than simple random samples in order to 
compensate for the higher associated sampling error. In some cases, the 
lower costs of cluster sampling permits the sample to be expanded to 
the point where sampling error is actually lower than for simple random 
sampling with the same cost constraint. 


Two-stage and multistage Two-stage and multistage sampling involves selecting a sample in at least 
sampling two stages. At the first stage, large groups or clusters of members of the 

target population are selected. These clusters are designed to contain 
more members than are required for the final sample. At the second 
stage, members are sampled from the selected clusters to derive the final 
sample. If more than two stages are used, the process of sampling within 
clusters continues until the final sample is achieved. An example of 
multistage sampling is where firstly electoral subdivisions (clusters) are 
sampled from a city or State, secondly blocks of houses are selected from 
within the selected electoral subdivisions and thirdly houses are selected 
from within the selected blocks of houses. 
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Two-stage and multistage 


SUMMARY 


sampling continued 


As with stratified sampling, a number of practical considerations should 
be kept in mind when planning a multistage sample: 


=» The clusters should be designed so that they collectively include all 
members of the target population. 


=» Each member must appear in only one cluster. 


=» The definitions or boundaries of the clusters should be precise and 
unambiguous. In the case of geographic clusters, natural and 
man-made boundaries such as rivers and roads are often used to 
delimit the cluster boundaries. 


The advantages of multistage sampling are convenience and economy. As 
with cluster sampling, multistage sampling makes administration easier 
and reduces interviewing costs. Multistage sampling does not require a 
complete list of members in the target population and this greatly 
reduces the cost of preparing the sample. The list of members is 
required only for those clusters used in the final stage. At other stages, 
only the clusters need to be listed. 


The main disadvantage of multistage sampling is the same as for cluster 
sampling, i.e. higher sampling error. To compensate for this, larger 
sample sizes are needed. A design issue which needs to be addressed in 
working with multistage sampling is the number of units selected in each 
stage. If more first stage units are selected, costs will generally increase 
while sampling errors decrease. If relatively more second stage units are 
selected, the reverse applies. The optimal sampling scheme will take 
account of the cost and variance structures of the population and is quite 
complex. Raising the number of clusters selected in the first stage is 
likely to have greater cost consequences than raising the sampling 
fraction for later stages. 


The concerns of cost versus accuracy have a major bearing on the choice 
of both sampling methodology and sample size. The most suitable 
sampling methodology for a particular survey depends on the nature of 
the target population, the availability of a sampling frame, the nature of 
any supplementary information about the target population, the 
requirements of the survey, the availability of resources and the time 
available to complete the survey. Choice of sample size often reflects a 
compromise between the size needed for reliable results and the size for 
which resources are available. However, if there are insufficient resources 
to satisfy the survey’s objectives in the time available, the survey may 
need to be modified or even cancelled. Alternatively, non-survey methods 
could be considered such as those outlined in chapter 1. 


The descriptions given in this chapter are merely outlines and the 
mechanical use of these outlines will rarely produce the best results. 
Additional assistance from a trained statistician should always be sought 
for more complex designs. The Australian Bureau of Statistics may be 
able to provide this assistance through its Statistical Consultancy 
Service—see contact details on the last page. 
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CHAPTER 5 SOURCES OF ERROR 


INTRODUCTION Two types of error can occur in sample surveys: sampling error and 
non-sampling error. Sampling error arises through selecting a sample of 
only part of the target population. Non-sampling error can occur at any 
stage of a survey and can also occur with censuses (i.e. when every 
member of the target population is included). Sampling error can be 
measured mathematically whereas measuring non-sampling error can be 
difficult. It is important for a researcher to be aware of the causes of 
these errors, in particular non-sampling error, so that they can be either 
minimised or eliminated from the survey. 


SAMPLING ERROR Sampling error reflects the difference between an estimate derived from a 
survey and the ‘true value’ that would be obtained if the whole target 
population were included. If sampling principles are applied carefully, 
sampling error can be kept to a minimum. 


Factors affecting sampling The size of the sampling error indicates how different the survey results 
error are likely to be from the results which would be obtained from a 
complete enumeration of the target population. The following factors 
influence the size of the sampling error: 


s Sample size. In general, larger samples give rise to smaller sampling 
error. However, in order to halve the size of the sampling error it is 
necessary to increase the sample size fourfold, which greatly increases 
the cost of the survey. 


s Sample design. Stratified sampling generally reduces the size of the 
sampling error by reducing the variability of the population to that 
within each stratum whereas cluster sampling tends to increase the 
error. 


a Sample/population ratio. The larger the sample is as a proportion of 
the target population, the smaller will be the sampling error. 
However, non sampling errors may increase as sample size increases. 


s Population variability. When members of a target population differ 
widely based on the characteristic being measured, sampling error is 
greater than when the members are similar. Sample size should be 
increased in order to make the sample more representative of the 
target population and to reduce the size of the sampling error. 
Cluster sampling increases the size of the sampling error when the 
characteristic being measured is clustered in particular areas which 
cannot be identified in the sample design stage. Stratified sampling 
can reduce sampling error by reducing population variability within 
each stratum. 


Measurement of sampling One measure of sampling error is called standard error or standard 
error deviation. Any estimate derived from a survey has a standard error 
associated with it (called the standard error of the estimate). The 
standard error is used to determine a range of values, or an interval, that 
is expected to contain the ‘true value’ that is being measured by the 
survey estimate. Assuming that the target population is distributed 
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Measurement of sampling 
error continued 


NON-SAMPLING ERROR 


Causes of non-sampling 
error 


normally (i.e. it follows a bell-shaped curve) on the characteristic being 
measured, the interval is usually calculated as being one, two, or three 
standard errors above and below the survey estimate. There is a 95% 
chance that the confidence interval lying within two standard errors on 
either side of the estimate contains the ‘true value’. This interval is called 
the 95% confidence interval and is the most commonly used confidence 
interval. Other confidence intervals are the 68% confidence interval 
(where the confidence interval lying within one standard error on either 
side of the estimate has a 68% chance of containing the ‘true value’) and 
the 99% confidence interval (where the confidence interval lying within 
three standard errors on either side of the survey estimate has a 99% 
chance of containing the ‘true value’). 


For example, suppose a survey estimate is 50 with a standard error of 
10. The confidence interval 40 to 60 has a 68% chance of containing the 
‘true value’, the interval 30 to 70 has a 95% chance of containing the 
‘true value’ and the interval 20 to 80 has a 99% chance of containing the 
‘true value’. 


In principle, every operation of a survey is a potential source of 
non-sampling error. Some examples of causes of non-sampling error are 
non-response, badly designed questionnaire, respondent bias and 
processing errors. The sections that follow discuss the different causes of 
non-sampling errors. 


Non-sampling errors can be grouped into two main causes: systematic 
and random. 


Systematic error (called bias) makes survey results unrepresentative of the 
target population by distorting the survey estimates in one direction. For 
example, if the target population is the population of Australia but the 
sampling frame is just males, then the survey results will not be 
representative of the target population due to systematic bias in the 
sampling frame. 


Random error can distort the results on any given occasion but tends to 
balance out on average. 


Some of the types of non-sampling error are outlined below. 


Failure to identify the target population 

This can arise from the use of an inadequate sampling frame, imprecise 
definition of concepts, and poor coverage rules. Problems can also arise 
if the target population and survey population do not match very well. 


Non-response bias 

Non-respondents may differ from respondents in relation to the 
attributes/variables being measured. Non-response can be total (none of 
the questions answered) or partial (Some questions may be unanswered 
owing to memory problems, inability to answer, etc.). To improve 
response rates, care should be taken in training interviewers, assuring the 
respondent of confidentiality, motivating him/her to cooperate, and 
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Causes of non-sampling calling back if the respondent has been previously unavailable. 'Call 
error continued _backs' are successful in reducing non-response but can be expensive. 


Non-response is a particular problem for surveys of businesses. Care 
needs to be taken to ensure the right contact is reached in the business, 
the data required are available and an adequate follow up strategy is in 
place. Good survey testing (Chapter 6) practices are vital for all surveys, 
especially those of businesses. 


Questionnaire 

The content and wording of the questionnaire may be misleading and 
the layout of the questionnaire may make it difficult to accurately record 
responses. Questions should not be misleading or ambiguous, and 
should be directly relevant to the objectives of the survey. 


Interviewer bias 

The way the respondent answers questions can be influenced by the 
interviewer’s manner, choice of clothes, sex, accent and prompting when 
a respondent does not understand a question. A bias may also be 
introduced if interviewers receive poor training as this may have an affect 
on the way they prompt for, or record, the answers. 


Respondent bias 

Refusals and inability to answer questions, memory biases and inaccurate 
information will lead to a bias in the estimates. An increasing level of 
respondent burden due to the number of surveys being conducted has 
resulted in considerable difficulty in encouraging potential respondents to 
participate in a survey. When designing a survey it should be 
remembered that uppermost in the respondent’s mind will be protecting 
their own personal privacy, integrity and interests. Also, the way the 
respondent interprets the questionnaire and the wording of the answer 
the respondent gives can cause inaccuracies to enter the survey data. The 
non availability of data can also prove to be a significant hurdle for a 
survey. Careful questionnaire design, effective training of interviewers and 
adequate survey testing can overcome these problems to some extent. 


Processing errors 

There are four stages in the processing of the data where errors may 
occur: data grooming, data capture, editing and estimation. Data 
grooming involves preliminary checking before entering the data onto the 
processing system in the capture stage. Inadequate checking and quality 
management at this stage can introduce data loss (where data are not 
entered into the system) and data duplication (where the same data are 
entered into the system more than once). Inappropriate edit checks and 
inaccurate weights in the estimation procedure can also introduce errors 
to the data. To minimise these errors, processing staff should be given 
adequate training and realistic workloads. 
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Causes of non-sampling 
error continued 


Non-response 


Misinterpretation of results 

This can occur if the researcher is not aware of certain factors that 
influence the characteristics under investigation. A researcher or any 
other user not involved in the collection stage of the data gathering may 
be unaware of trends built into the data due to the nature of the 
collection (e.g. for a survey collecting income as a data item among all 
persons earning an income, the estimate would be different from the 
estimate produced by a survey conducted among persons found at home 
during daytime hours). Researchers should carefully investigate the 
methodology used in any given survey. 


Time period bias 

This occurs when a survey is conducted during an unrepresentative time 
period. For example, a survey designed to collect data about the weekly 
entertainment expenditure of families in Sydney should not be conducted 
in the period of the Royal Easter Show as the results may be affected by 
the show itself. If it is required to collect information on people’s 
recreational patterns, these can be affected noticeably by both the time of 
week and the time of year and such factors would need to be kept in 
mind when designing a suitable questionnaire. 


Minimising non-sampling error 

Non-sampling error can be difficult to measure accurately, but it can be 
minimised by: 

=» careful selection of the time the survey is conducted; 

=» using an up-to-date and accurate sampling frame; 

= planning for 'call backs' to unavailable respondents; 

=» careful questionnaire design and adequate testing; 

» careful design of the processing system, including edit checks; 

= providing thorough training for interviewers and processing staff; and 
=» being aware of all the factors affecting the topic under consideration. 


Non-response results when data are not collected from respondents. The 
proportion of these non-respondents in the sample is called the 
non-response rate. Non-response can be either partial or total. It is 
important to make all reasonable efforts to maximise the response rate as 
non-respondents may have differing characteristics to respondents. This 
causes bias in the results. 
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Non-response continued Partial non-response 
When a respondent replies to the survey answering some but not all 
questions then it is called partial non-response. Partial non-response can 
arise due to memory problems, inadequate information or an inability to 
answer a particular question. The respondent may also refuse to answer 
questions if they: 


= find questions particularly sensitive; or 


=» have been asked too many questions (the questionnaire is too long). 


Total non-response 

Total non-response can arise if a respondent cannot be contacted 

(the frame contains inaccurate or out-of-date contact information or the 
respondent is not at home), is unable to respond (may be due to 
language difficulties or illness) or refuses to answer any questions. 


Minimising non-response 

Response rates can be improved through good survey design via short, 
simple questions, good forms design techniques and by effectively 
explaining survey purposes and uses. Assurances of confidentiality are 
very important as many respondents are unwilling to respond due to 
privacy concerns. For business surveys, it is essential to ensure that the 
survey is directed to the person within the organisation who can provide 
the data sought. Call backs for those not available and follow-ups can 
increase response rates for those who, initially, were unable to reply. 


Following are some hints on how to minimise refusals in a personal or 
telephone contact: 


use positive language; 
=» get the right contact, particularly for business surveys; 
= state how and what you plan to do to help with the questionnaire; 


= stress the importance of the survey and the authority under which the 
survey is being conducted; 


=» explain the importance of their response as being representative of 
other units; 


=» emphasise the benefits from the survey results; 
= give assurance of the confidentiality of the responses; and 


= find out the reasons for their reluctance to participate and try to talk 
through them. 


Other measures that can improve respondent cooperation and maximise 
response include: 


= public awareness activities including discussions with key organisations 
and interest groups, news releases, media interview and articles—this 
is aimed at informing the community about the survey, identifying 
issues of concern and addressing them; and 
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Non-response continued 


=» where possible, use a primary approach letter, which gives 
respondents advance notice and explains the purposes of the survey 
and how the survey will be conducted. 


In case of a mail survey most of the points above can be stated in an 
introductory letter or through a publicity campaign. Other non-response 
minimisation techniques which could be used in a mail survey are: 


= including a postage-paid mail-back envelope with the survey form; and 


= reminder letters. 


Allowing for non-response 

Where non response is at an unsatisfactory level after all reasonable 
attempts to follow-up are undertaken, bias can be reduced by imputation 
for item non-response (non-response to a particular question) or 
imputation for unit non-response (complete non-response for a unit). 


The main aim of imputation is to produce consistent data without going 
back to the respondent for the correct values thus reducing both 
respondent burden and costs associated with the survey. Broadly 
speaking the imputation methods fall into three groups: 


= the imputed value is derived from other information supplied by the 
unit; 


= values by other units can be used to derive a value for the 
non-respondent (e.g. average); and 


= an exact value of another unit (called donor) is used as a value for the 
non-respondent (called recipient). 


When deciding on the method of imputation it is desirable to know what 
effect imputation will have on the final estimates. If a large amount of 
imputation is performed the results can be misleading particularly if the 
imputation used distorts the distribution of data. 


If at the planning stage it is believed that there is likely to be a high 
non-response rate, then the sample size could be increased to allow for 
this. However, the problem may not be overcome by just increasing the 
sample size, particularly if the non-responding units have different 
characteristics to the responding units. Imputation also fails to totally 
eliminate non-response bias from the results. 


Establishing the extent of non-response bias 

If a low response rate is obtained, estimates are likely to be biased and 
therefore misleading. Determining the exact bias in estimates is difficult. 
An indication can however be obtained by: 


=» comparing the characteristics of respondents to non-respondents (e.g. for 
a survey of attitudes to motor bike racing which is known to be 
age-related, a comparison of the age distribution of respondents to 
non-respondents would provide an indication of non-response bias); and 
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Non-response continued Establishing the extent of non-response bias continued 


= comparing results with alternative sources and/or previous estimates; 


=» performing a post-enumeration survey on a subsample of the original 
sample with intensive follow-up of non-respondents. 


Example: Effect of non-response 

Consider a postal survey of 3,421 fruit growers that is run to estimate 
the average number of fruit trees on a farm. Allowing an initial period 
for response, a low response rate may still exist. After two reminders, 
suppose there there was still only a 37% response rate with the following 








results. 
Cumulative Combined 
response average 
Initial response 300 456 
After 1 reminder 843 408 
After 2 reminders 1277 385 





From other information’, suppose it was known the overall average 
number of fruit trees was 329. If survey results had been published 
without any follow-up, then the estimate for the average number of trees 
(456) would have been too high as farms with a greater number of trees 
appeared to have responded more promptly. With follow-up, more 
smaller farms may have sent back survey forms and the estimate (385) 
became closer to the true value (329). 


Benchmarking 

Adjusting the weights so they sum to population is referred to as 
benchmarking. Benchmarking is often used in the ABS to ensure that 
population surveys are consistent with results from the Population 
Census. In particular, the ABS benchmarks sex and age break downs. 
Benchmarking will reduce the effect of non-response bias from estimates, 
although it will not remove all of the effect. 


In some cases, the achieved sample may not accurately represent the 
population. This could occur due to the random selection of the sample 
or due to differing response rates for separate population groups. We 
can use information from other sources to create a more accurate 
description of the population. Consider a sample of school children in 
which 30% of the respondents are male and 70% of the respondents are 
female. Through the schools attendance role we have identified that 
there are actually 50% males and 50% females in the school. Estimates 
that we produce from our sample of children would not accurately 
reflect the entire school. To create more accurate estimates we adjust the 





1 In general such ‘other information’ would of course not be available, otherwise 
there would be no need to conduct the survey; the above example illustrates 
though, the need for follow-up to reduce the effect of non-response bias. 
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Non-response continued 


SUMMARY 


weights of the respondents used to derive the estimates, so that they add 
up to the population total. In this example, the males weight would be 
increased while the females weight would be reduced. 


Sampling error can be minimised through careful choice of sample design 
and sample size, within the constraints of available resources. Sampling error 
can be measured and used to determine how close a sample estimate is to 
its corresponding ‘true value’ in the target population. Non-sampling error 
can be difficult to measure accurately but can be minimised by: 


=» careful selection of the time the survey is conducted; 

= using an up-to-date, accurate sample framework; 

= planning for ‘call backs’ to unavailable respondents; 

=» careful questionnaire design; 

= providing thorough training for interviewers and processing staff; and 
=» being aware of all the factors affecting the topic under investigation. 


If there are remaining doubts about the material covered in this chapter 
additional assistance should be sought from a trained statistician. The 
Australian Bureau of Statistics provides this assistance through its 
Statistical Consultancy Service—see contact details on the last page. 
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CHAPTER 6 


INTRODUCTION 


TYPES OF TESTING 


Skirmish or pretesting 


Focus groups 


SURVEY TESTING 


Testing survey procedures is an important part of developing any survey. 
Testing is used to: 


assess the suitability of the chosen sampling methodology; 
= estimate sampling error and variability in the target population; 
= estimate likely response rates; 


= identify weaknesses in sample framework, questionnaire design, and 
the method of data collection; 


= assess field work procedures, and processing procedures; and 
= estimate costs. 


Five main types of testing are used to evaluate survey procedures: 
skirmishing, focus groups, observational studies, pilot testing, and dress 
rehearsals. Each type is used at a different stage of the survey’s 
development and aims to test different aspects of the survey. 


A skirmish or pretesting refers to an informal test of a questionnaire with 
small groups of respondents. The questionnaire used is usually loosely 
structured, with many open questions, thereby allowing the researcher to 
examine different ways to word questions. The questionnaire is tested by 
asking people questions and then getting some feedback from these 
people on the questionnaire. 


A skirmish provides feedback on issues such as: 


= the level of knowledge needed to answer the questions; and 
s likely responses, and how answers are formulated. 


A skirmish is used to detect flaws, awkward question wording, and can 
also test alternative designs. 


Skirmishes are often carried out at the initial developmental stage of the 
questionnaire or when there is insufficient time or resources available to 
conduct a focus group or a full pilot test. 


A focus group (sometimes also called a discussion group) is an informal 
discussion of a topic with a small group of people from the survey 
population, often recruited to meet defined characteristics. It provides 
insight into the attitudes, opinions, concerns, and knowledge, or 
experiences of the participants. It is also particularly useful for learning 
about the scope of a domain, the definitions of items of interest and the 
comprehension of key words. Focus groups also assist in learning about 
subgroups and cultures. 
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Focus groups continued 


Observational studies 


Focus groups can provide a wealth of detailed qualitative information 
because of the in-depth probing method used. Focus groups can help us 
to better understand how well respondents understand our concepts, 
definitions, question wording, and other issues about the topic. They are 
used to understand the range of attitudes or understanding, rather than 
gaining quantitative information. 


Focus groups are a relatively cheap and easy way to obtain information 
in a short period of time. Participants can 'feed' off each other, with one 
comment causing someone to think of another point. 


Their purpose is to explore rather than to definitively describe or explain. 
Therefore focus groups are most often used in the very early stages of the 
survey development cycle. They are used mostly for new surveys, but can be 
used for testing changes to an existing survey before the questions are written. 


However, group dynamics can interfere with the discussion, e.g. extroverts 
can take over. This can also bias the results towards the dominant 
participants. Participants also tend to give 'public' opinions and therefore 
focus groups are not as suitable for discussion of sensitive issues. 


Results from a focus group may be complicated by factors such as: 


=» people who are willing to take part in a focus group may not be 
representative of the target population; 


= the 'open-ended' nature of responses and hence the large volume of 
information makes analysis cumbersome. 


Focus group research should be regarded as preliminary, with results not 
generalised to the whole population without further quantitative research. 


Observational studies involve getting respondents to complete the draft 
questionnaire in the presence of an observer. Whilst completing the 
form, respondents explain their understanding of the questions and the 
methods required in providing the information. Respondents should be 
made aware that it is the form that is being tested and not the 
respondent. It is also important that the respondent is not given 
assistance in completing the form during an observational study. 


Much can be gained from such studies including identifying problem 
questions through observations, questions asked by the respondents, or 
the time taken to complete particular questions. Data availability and the 
most appropriate person to supply the information can also be obtained 
through observational studies. In the development of business surveys, 
observational studies are also important for identifying the records that 
are referenced when providing the data sought, so that advice can be 
provided to survey respondents about the types of records which would 
assist in obtaining the data sought for the survey. 


Observational studies and/or focus groups can be used to test 
respondent’s interpretation of complex concepts. Particular groups of 
respondents for which the researcher knows the correct response to a 
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Observational studies 


continued 


Pilot testing 


Dress rehearsals 


CONDUCTING SURVEY 
TESTS 


Composition of test samples 


topic or series of questions can indicate whether participants are able to 
respond appropriately to the questions asked. 


‘Control groups’, can be formed that have a particular characteristic or 
interest that is under investigation. By asking these participants to 
complete a draft questionnaire or partake in a focus group the research 
can test whether they are able to respond to the questions being asked. 
For example, in testing questions on disability, whether a control group, 
in this case disabled, respond appropriately to the disability questions 
can be tested. 


Pilot testing involves formally testing a questionnaire or a survey with a 
small sample of respondents in the same way that the final survey will be 
conducted. Pilot testing is used to: 


» identify any problems with aspects of questionnaire design such as the 
questionnaire’s format, length, wording of questions, and so on; 


=» compare alternative versions of a questionnaire; 
= assess the adequacy of instructions to interviewers; and 
= ascertain interview times. 


A dress rehearsal is the final test of a survey where the chosen sampling 
methodology is used to select a small sample from the target population. 
Dress rehearsals are used to: 


= evaluate survey plans; 


= estimate survey costs per sampled unit (important for staffing and 
funding decisions); 


= estimate interview, travel, and interviewer editing time per sampled 
unit; 


= estimate variability within the population (i.e. population variances) 
and hence sampling error; and 


= evaluate the processing system design. 


If appropriate information is available from previous surveys, it may be 
possible to estimate population variances and costs from this information 
rather than from the results of a dress rehearsal. However, if such 
supplementary information is not available, a dress rehearsal can be used 
to estimate costs and variances. 


Dress rehearsals are normally used only for large-scale surveys. 


The samples chosen for pilot tests and dress rehearsals should be as 
representative of the target population as possible. This maximises the 
validity of the test results and ensures that consequent modifications to 
the survey are appropriate. 
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Comparison of techniques 


Frequency and size of tests 


ANALYSIS OF TEST RESULTS 


Implications for 
questionnaire and survey 
design 


Pilot tests can be used as a basis for choosing between alternative 
procedures for part of a survey. One approach to such testing could be 
to allocate two equal-size samples to interviewers and each interviewer 
uses both of the alternative procedures. The two techniques can then be 
compared without the interfering bias of differing interviewer 
performance. 


The number of times survey testing should be conducted and the sample 
sizes used are determined by the complexity of the survey and the 
availability of funds. For example, a simple, small survey may need only 
50 respondents for a pilot test, whereas a larger, more complex survey 
may need 200 or more respondents. 


The results of survey testing can bring to light a number of problems 
with questionnaire and survey design. Some of these problems may be 
identifiable during a skirmish; others may be identifiable only when the 
questionnaire is administered to a sample of the target population. 
Examples of the types of problems that could be highlighted with 
questionnaire design are: 


= non-response to a particular question by a number of respondents; 
=» multiple answers being given when only one should be chosen; 


=» a large number of ‘other’ or ‘not applicable’ responses to a particular 
question; 


» lack of variation in the responses to a question; and 
=» general misunderstanding of a question. 


The solution to these problems may lie in the sequencing of questions, 
the instructions accompanying questions, the wording of questions and 
the answer categories provided for questions. Changing the answer 
categories can produce more variation in the responses given to a 
question. Some techniques for doing this include: 


= increasing the list of answer categories (while taking care not to make 
lists too lengthy); 


=» changing the order of this list; and 
=» changing the emphasis of the question and/or the answer categories. 


In addition to problems with questionnaire design, survey testing may 
also uncover problems with the: 


=» sampling frame; 
=» sampling methodology; 
= sample size; 


= assumptions regarding population variability used in designing the 
survey; 
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Implications for 
questionnaire and survey 
design continued 


Implications for 
questionnaire and survey 
design continued 


‘Previewing’ survey results 


SUMMARY 


= sources of non-sampling error; 
= survey administration; and 


= processing of the survey data, particularly testing the suitability of 
forms for data preparation. 


Once identified, these problems can be addressed and either eliminated 
or minimised before the full survey is conducted. When used in this way, 
pilot testing is an invaluable tool for maximising, within resource 
constraints, the quality of results obtained from the final survey. 


The results of pilot testing and dress rehearsals can also be used to 
provide a ‘preview’ of the results of the full survey. The data gained from 
the test can be analysed in the same way as the final survey, 
incorporating tables, statistical analyses (e.g. correlations, scales, etc.) and 
discussion of the findings. 


Survey testing is an important part of preparing a survey as it enables 
problems to be identified and corrected before the full survey is 
conducted. This saves spending resources on a survey whose results may 
be invalid. Sufficient planning and funds should be allocated to the 
testing stage in order to ensure that the final survey produces meaningful 
and valuable results. 
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CHAPTER 7 


INTRODUCTION 


SURVEY AND CONTRACT MANAGEMENT 


Survey management involves organising and controlling each aspect of a 
survey being undertaken, while contract management involves ensuring 
that the work done by a contractor who has been engaged to conduct all 
or part of the survey is done according to the agreed specifications. 


Rather than taking on the responsibilities of directly managing all aspects 
of a survey, it may be advantageous to contract a consultant to undertake 
some or all of the associated tasks. Depending on needs and budget, this 
may range from questionnaire design and sample selection, through to 
data collection, data processing and report production. 


Whether it is decided to conduct the survey totally in-house or to engage 
another organisation to undertake part or all of the survey development 
and implementation, it is worthwhile being aware of all the various 
aspects so that the process can be more effectively managed. 


The various aspects of a survey include: 


= estimating resource requirements, and monitoring and managing 
resources; 


=» developing the survey frame, developing the survey instrument, 
selecting the sample; 


= systems development; 

=» developing a despatch and collection control system; 

=» determining employment conditions of interviewers and office staff; 
= recruitment and training of interviewers and office staff; 

» allocating and managing workflows; 

= maintaining personnel records etc.; 


= paying interviewers and office staff (including checking travel claims 
and any other expenses); 


» establishing and maintaining office supplies and providing interviewer 
equipment; 


= monitoring the survey’s progress; 


=» undertaking a quality assurance program including adequate testing 
and evaluation; 


= data processing; 
= oOutput—preparation and presentation; and 


» establishing a survey authority process, e.g. primary approach letters, 
brochures, identification cards, etc. 
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FINANCE 


TRAINING 


The financial resources available are often a deciding factor influencing 
design of a survey. It is therefore important to make financial estimates 
for each aspect of a survey and to monitor spending closely. When 
preparing these financial estimates, it is also essential to ensure that 
sufficient time as well as funding has been allocated to each stage of the 
conduct of the survey with particular emphasis on activities such as 
systems development, training (including documentation) and evaluation. 


Adequate resources should be allocated to survey management as it plays 
a central role in the conduct of a survey, particularly the most labour 
intensive part. Financial estimates for other aspects of a survey generally 
fall into three categories: overheads, salaries, and survey and processing 
costs. Specifically, some of the aspects for which financial estimates need 
to be made are: 


=» adequate testing of all facets of the survey; 


s hire of rooms (for administration, training of staff, and processing of 
the survey data); 


= systems development and data processing; 
= office equipment, printing and postage; and 
s salaries and any travel costs including a component for training. 


The adequacy of training given to interviewers and processing staff has a 
strong influence on the quality of results obtained from a survey. 
Thorough training of interviewers is important because they have a wide 
range of tasks to perform and are the main point of contact between 
respondents and researcher. Comprehensive training of office staff should 
enable them to process the survey questionnaires as accurately and as 
quickly as possible. 


Training can be provided in the form of manuals, formal training 
courses, and ‘on-the-job’ training. Topics covered in interviewer training 
should include: 


the purpose of the survey; 

= the scope and coverage of the survey; 

= a general outline of the adopted sampling approach; 
= the questionnaire; 

=» recording answers; 

= interviewing techniques; 

=» avoiding or reducing non-response; 

= maintaining cooperation; 

= field practice; 

= quality assurance; 


= editing; 
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TRAINING continued = planning the workload; and 


=» administrative arrangements. 


For processing staff, training should cover: 


the purpose of the survey; 

= the scope and coverage of the survey; 
= the questionnaire; 

=» recording of answers; 

=» coding; 

= editing; 

= data entry instructions, if appropriate; 
= quality assurance; and 

= administrative arrangements. 


It is also useful to include some form of study exercise to test 
understanding of the topics mentioned above. Both interviewers and 
processing staff will improve as they gain practical experience ‘on the 
job’ and consolidate their formal training. 


SUMMARY The operational aspects of a survey require thorough planning and 
efficient management of financial and human resources. In the course of 
addressing survey management issues, a researcher may find that the 
survey needs to be modified before it can be conducted feasibly. Since 
survey management is an integral part of a survey, sufficient resources 
should be allocated to it to ensure that the survey runs smoothly and 
produces reliable, timely results. 
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CHAPTER 8 


INTRODUCTION 


SELF-ENUMERATION 


SURVEYS 


Postal surveys 


COLLECTING THE DATA 


The procedures to be followed in collecting survey data depend largely 
on the technique chosen, i.e. self-enumeration or personal interview 
(see chapter 2). 


As outlined in more detail in chapter 2, self-enumeration surveys include: 


= those where questionnaires are sent out and returned through the 
post (postal surveys); and 


= those where questionnaires are delivered and/or returned by hand 
(hand-delivered surveys). 


For postal surveys a questionnaire is generally mailed out with a covering 
letter and a reply-paid envelope. The completed questionnaire is then 
returned in the envelope provided. Outstanding questionnaires can be 
followed up either by written reminders or by telephone. 


Stationery requirements 

The basic stationery requirements for postal surveys include 
questionnaires, covering letters, reminder letters, envelopes, reply-paid 
envelopes, and, if desired, adhesive address labels. It is a good idea to 
consult the post office on matters such as preferred article sizes, postage 
rates, and requirements for reply-paid envelopes. 


To estimate the number of questionnaires, letters, and envelopes 
required, information from a previous, similar survey can be used (if 
available). If such information is not available, estimates must be based 
on the size of the sample and the expected response rate. For example, 
a survey obtaining a high initial response rate requires fewer reminder 
letters, envelopes, and questionnaires than a survey obtaining a low 
initial response rate. Enclosing an additional questionnaire and reply-paid 
envelope with reminder letters can boost the final response rate, 
especially where the original questionnaire has been misplaced or 
discarded by a respondent. Estimates of stationery requirements should 
therefore include a component for such additional questionnaires and 
reply-paid envelopes. 


Labelling and dispatch 

Before questionnaires are labelled and dispatched, a register (called a 
collection control register) should be compiled listing all respondents 
selected in the survey. The register should list each respondent’s name, 
postal address, and some form of unique identification. (The 
identification, which is usually a number, enables outstanding 
questionnaires to be identified and followed up.) 


When labelling questionnaires, the respondent’s name, postal address, 
and identification number should appear on the covering letter and 
mail-out envelope (if window envelopes are not used). The identification 
number should also appear on the questionnaire so that returned 
questionnaires can be marked off the collection control register either 
manually or perhaps using OCR technology. 


46 ABS - AN INTRODUCTION TO SAMPLE SURVEYS: A USER'S GUIDE + 1299.0 - 1999 


Labelling and dispatch continued 

Adhesive labels are a commonly used means of labelling letters, 
envelopes, and questionnaires. Modern printing techniques also allow for 
direct printing onto questionnaires of label details. The labels can be 
printed from the collection control register, either by manual 
transcription or by computer. For surveys of moderate or larger size, 
computer generation provides the most efficient method of printing 
labels. Although computer systems can be costly to establish and 
maintain, the advantages of generating labels by computer are that 
names, addresses and identification numbers need to be entered only 
once, or can be generated directly from the collection control register. If 
the register is computer-based, labels can be produced quickly and if 
necessary, extra labels can be produced easily at a later date (e.g. for 
reminder letters). 


When the questionnaires, covering letters, and envelopes have been 
labelled, the questionnaires and letters must be inserted into the 
envelopes. Although this is straightforward, mistakes can occur, e.g. the 
labels on the letter and/or questionnaire may not match the label on the 
envelope. One way of avoiding this is to use window envelopes, but care 
must still be taken to ensure the labels on the letter and questionnaire 
match. Another common mistake is for something to be omitted from the 
envelope. Errors such as these can create bad impressions and cause 
non-response. To avoid such problems, materials should be 
well-organised and spot checks should be conducted as a quality control 
measure. 


Reminder action 

As questionnaires are returned, their identification number should be 
marked off the collection control register. This can be done either 
manually or by computer, depending on the nature of the register. The 
control register can then be used to identify which questionnaires need 
to be followed up. Labels for reminder letters or reminder letters 
themselves can also be produced from the register, either manually or by 
computer. 


Reminder action should be timed to coincide with the due date and/or a 
drop in the rate at which questionnaires are returned. To identify this 
time, the response rate should be calculated and updated regularly, 

e.g. daily (the rates can be used to plot a graph, as a direct visual aid). 
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Hand-delivered surveys 


INTERVIEW SURVEYS 


Face-to-face interview 
surveys 


response 
rate % 

100 5 
80 4 
60 4 
40 5 


20 + 








6 11(a) 16 = =21(b) 25 
Days post despatch 


(a) First reminder sent. 
(b) Second reminder sent. 


Reminders (whether by letter or telephone) should then be effective 
without being wasteful. The collection control register should also be 
annotated to show: 


=» which respondents have received reminders; 

= the date(s) on which reminder action was taken, 

= any comments or queries from respondents; and 

=» which questionnaires have been returned as unclaimed mail. 


In hand-delivered surveys, it is generally not essential to know 
respondents’ names. A questionnaire is generally delivered personally by 
an ‘interviewer’ or collector who introduces and explains the survey to 
the respondent. The respondent completes the questionnaire in his/her 
own time and either returns it in the reply-paid envelope provided or 
gives it to the collector on his/her return. Outstanding questionnaires can 
be followed up by means of reminder letters or ‘call backs’ by the 
collector or both. In other respects, the collection procedures and 
processes for hand-delivered surveys are basically the same as for postal 
surveys. 


As discussed in chapter 2, interview surveys can generally be conducted 
by one of two methods: face-to-face interviews and telephone interviews. 


Many surveys are conducted by face-to-face interview because the units 
selected in the sample cannot be identified by name and address. In 
such cases, areas are selected and then, within these areas, 
dwellings/shops/factories/etc. can be further selected, according to the 
nature of the survey (see chapter 4 Cluster sampling and Two-stage and 
multistage sampling). 
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Workload allocation 

Each interviewer is given a workload, or group of interview selections, to 
complete. When determining the size of a workload, consideration 
should be given to the: 


= time taken for each interview; 

= time available for interviewing; 

a distances to be travelled; 

=» complexity of the questionnaire; 
= number of interviewers available; 


= total number of interviews to be conducted; and 


expected number of ‘call backs’ required. 


Depending on how far interviewers live from the areas where 
interviewing is to be conducted, workloads can be distributed: 


= at training or briefing sessions; 
= by mail; 
= by courier; or 


= by interviewers collecting them from a central point. 


Interviewing 

Interviewers should check that all necessary documents have been 
obtained before starting to interview. It is a good idea to try to complete 
as many interviews as possible early in the interview period. This allows 
time both for ‘call backs’ to unavailable respondents and for clarifying 
any problems on the questionnaires. 


Interviews should be conducted at a time convenient to respondents, 
even if this means calling back at a later date. In particular, it is wise to 
avoid calling before 9.00 a.m. or after 8.00 p.m. 


The interviewer’s opening remarks and the manner in which they are 
made have a strong influence on respondents’ reaction and their 
willingness to cooperate. Before any questions are asked, the interviewer 
should: 


= give his/her name; 
=» explain that a survey is being conducted and by whom, 


= provide an identification document and give the respondent time to 
read it—the document should include the telephone number of the 
survey manager or supervisor; 
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NAME, ADDRESS AND 
TELEPHONE OF SURVEY 


ORGANISATION This is to certify that 


is a member of our interviewing 
panel. 





Should you have any query about 
(PHOTOGRAPH) the background to 

our organisation or purpose of 
this survey, we would welcome 
your enquiry. 











Authority expires on ......... Signed for organisation .............::000+ 











=» explain that the respondent’s household/business/etc. has been 
selected in the sample for the survey; and 


= briefly explain the purpose of the survey. 


Some respondents may wish to satisfy themselves that the survey and 
interviews are genuine. If respondents have questions about the survey, 
the interviewer should answer them genuinely, drawing on knowledge 
gained during his/her training. 


In addition to the interviewer’s attitude and ability to answer 
respondents’ questions, the interaction between interviewer and 
respondent is crucial for gaining and maintaining respondents’ 
cooperation. Some techniques the interviewer can use to enhance this 
interaction are to: 


s listen attentively; 
=» allow a respondent to relate personal experiences; 
=» keep the interview time short; 


» refrain from any suggestion that one answer is more acceptable than 
another to the interview. 


A well-designed questionnaire should include instructions to guide the 
interviewer through the questionnaire. It is also important for the 
interviewer to have a thorough knowledge of the questionnaire so that 
the interview can proceed smoothly. It is essential that all questions are 
asked exactly as worded on the questionnaire. This avoids the possibility 
of a question taking on a different meaning and introducing bias to the 
results. If a respondent has difficulty understanding a question, it should 
be repeated more slowly and, if necessary explained further. However, 
the interviewer should avoid rewording the question in later interviews. 
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Telephone interview 


Interviewing continued 

Respondents’ answers should be recorded clearly so that they are 
unambiguous to office processing staff. Answers are usually recorded by 
inserting a tick or a number in the appropriate boxes, or by writing an 
exact account of the respondent's answer in the space provided on the 
form. 


To close the interview the interviewer should thank the respondent for 
his or her cooperation and check to see if the respondent has any 
further questions about the survey. Respondents should be advised if any 
additional or follow-up interviews are planned. 


Editing 

After each day’s interviews have been completed, interviewers should 
check questionnaires for completeness, accuracy, and correct sequencing 
of questions. If necessary, ‘call backs’ for any missing information can 
then be planned. 


Non-response 

Any cases of refusal should be notified to the survey’s manager as soon 
as possible. The survey manager should also be notified of any 
questionnaires which are retained for ‘call backs’ after the date set for 
returning questionnaires to the office. 


Where sample members can be identified by name and telephone 
number, and survey objectives and sample design are compatible with 
telephone use, use of telephone interviewing can greatly reduce both the 
cost of the survey and the time elapsed before survey results are 
available. 


Questionnaires for telephone interviewing generally need to be much 
briefer and simpler than for face-to-face interviewing as it is much easier 
for a respondent to end the interview prematurely (see chapter 2). 


Workload allocation 
The size of a telephone interviewer’s workload should be determined 
realistically according to the: 


time for each interview; 

= time available for interviewing; 

=» complexity of the questionnaire; 

= number of interviewers; 

a total number of interviews to be conducted; and 

= expected number of telephone ‘call backs’ required. 


As telephone interviewing is generally conducted from an office, 
workloads are generally distributed at the office. 
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QUALITY ASSURANCE 


Interviewing 

Many of the principles of face-to-face interviewing apply also to telephone 
interviewing. If anything, the techniques for gaining and maintaining 
cooperation are even more crucial to the success of telephone 
interviewing than face-to-face interviewing because of the: 


=» greater difficulty in establishing rapport without the assistance of 
non-verbal cues; and 


= greater ease with which respondents can end a telephone interview. 


Editing 

Following interviewing, telephone interviewers should check their 
questionnaires for completeness, accuracy, and correct sequencing of 
questions (as with face-to-face interviewing). With CATI, however, editing 
is conducted ‘on line’ during the course of the interview itself. 


Non-response 
The procedures for dealing with cases of refusal are basically the same 
for telephone interviewing as for face-to-face interviewing. 


The basic aim of any quality assurance program is to ensure that, within 
resource constraints, errors are minimised. The key to quality assurance 
and improvement is to be able to regularly measure the cost, timeliness 
and accuracy of a given process so the process can be improved when a 
fall in quality is indicated. The emphasis is on process improvement 
rather than correction. 


Quality assurance should be undertaken at all stages of a survey, to 
enhance the quality of subsequent stages and the final results. For 
repeating surveys, changes in processes and procedures identified can be 
implemented in subsequent cycles of the surveys to improve the quality 
of the results. 


The key elements of quality assurance are preparation and evaluation. In 
developing a survey, there are elements of the process which are 
essential to realising quality outcomes. In terms of preparation, 
researchers should test their proposed methodology and survey technique 
to ensure that the correct tools have been selected and that the 
procedures specified are appropriate. It is also essential to prepare 
documentation covering each aspect of the survey well in advance and to 
prepare comprehensive training programs for each aspect of the survey. 


Continuing evaluation of the strategies, techniques and procedures 
employed for the survey is also necessary to ensure that the lessons 
available from the day to day work of conducting the survey are used to 
enhance the quality of the survey outcome. 


Auditing is the final aspect of quality assurance activities with potential to 
impact on the quality of survey outcomes. While testing, effective 
preparation and evaluation are the crucial components of quality 
assurance, some auditing of processes will provide indicators of the 
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QUALITY ASSURANCE 


continued 


Post-enumeration survey 


SUMMARY 


reliability and appropriateness of the techniques in use for the survey. 
This might take the form of probity checking of interview work or, 
perhaps, re-coding of survey responses. Both of these approaches can 
provide early and accurate sources of feedback about the performance of 
the techniques chosen to conduct the survey. This feedback can be used 
to improve these processes. 


Please note that if monitoring of telephone interviews is a chosen quality 
assurance method, respondents must be warned that the telephone 
conversation may be monitored for quality assurance purposes. 


A post-enumeration survey or study (PES) is a follow-up interview with a 
sample of respondents and non-respondents after a survey has been 
conducted in the field, with the aim of evaluating the quality of the data. 
This may either be done through face-to-face interviews or by telephone, 
using questions about how the respondent completed the form and 
comparing the original responses with those obtained in the PES. 


A PES is conducted to uncover consistent errors made by respondents, 
and to find out why those errors are occurring. Particular aspects can be 
investigated in detail, if they are suspected problem areas. 


In practice a PES sample will generally be selected from those who 
responded to the original survey or special study conducted. This will 
most likely be a list of ‘potential’ PES respondents from a particular area 
chosen to conduct respondent visits. Ideally this list should be randomly 
ordered so that the final PES respondents are a random sub-sample. If 
there are any other dimensions (e.g. size of business) other than area to 
be considered then the profile of the final selections should be checked 
as many respondents will decline. The representativeness of the final 
sample can be checked. 


Regardless of the method used to collect the survey data, the actual 
process of collecting data involves considerable preparation and 
organisation. Procedures need to be well planned and easily translatable 
into action. If sufficient resources are allocated to an effective quality 
assurance program, potential problem areas can be identified at an early 
stage and allowing for adjustments to the processes and procedures to 
take place at an early stage and minimising the negative impact on survey 
results. 
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CHAPTER 9 DATA PROCESSING 


INTRODUCTION Data processing involves translating the answers on a questionnaire into 
a form that can be manipulated to produce statistics. In general, this 
involves coding, checking, data entry, editing and monitoring the whole 
data processing procedure. 


PROCESS CONTROL The main aim of the various stages of data processing is to produce a file 
of data that is free from errors. Adopting a methodical and consistent 
approach to each task is important to ensure that the processing is 
completed satisfactorily and on schedule. 


Documentation Comprehensive written instructions are required to specify data 
processing procedures, addressing most issues which can arise. These 
instructions should include: 


= code lists; 

= clerical and computer editing guidelines; 

» guidelines for using the computer system (if one is used); 
= a timetable; and 


= a list of contact names indicating who is in charge of the various 
aspects of the survey. This enables the survey processing staff to know 
whom to approach if problems arise. 


Timetabling <A timetable is essential for planning purposes and for ensuring that 
adequate resources are allocated to the various stages of processing. For 
example, there may be time restrictions on the availability of key punch 
operators. This would require completed questionnaires to be available at 
the time key punch operators are available. 


Distributing copies of the timetable to all processing staff helps to ensure 
that the data are processed on schedule. The timetable should be 
constructed realistically, taking into consideration the: 


=» number of processing staff available; 

=» number of questionnaires to be processed; 

=» length and complexity of the questionnaires; 

= expected number of errors needing clarification with respondents; 


= expected number of days off due to public holidays, sickness, etc.; 
and 


= date by which survey results are to be available. 
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Process control register 


Quality assurance 


A register containing information about the processing of the 
questionnaires is a useful method of enhancing the accuracy of survey 
results. The register should indicate which questionnaires have been: 


= sent for punching; 

= edited; 

=» declared, ‘clean’, that is, free from errors; and 
a when ‘clean’, loaded on to the final data file. 


The process control register can then be used to ensure that all 
questionnaires to be included in the final results have passed 
satisfactorily through the processing procedures. 


Quality assurance measures for the data processing stage include 
selectively checking that questionnaires have been correctly: 


a coded; 
=» punched/entered into a processing system; 
=a edited; and 


=» acknowledged as being final or ‘clean’ after passing through the 
processing stages. 


Other crucial aspects of effective quality assurance for the data processing 
stage include the preparation of accurate and comprehensive 
documentation and the provision of comprehensive training for staff 
involved in each stage of the survey. Once processing commences, 
analysis of repeating errors will assist in ‘fine-tuning’ editing procedures, 
to indicate where retraining may be necessary, and, for any subsequent 
iteration of the survey, will provide input to re-development. 


During processing, checks can also be made on the performance of the 
interviewers and, to some extent, the questionnaire itself. Interviewers’ 
error rates and any problems with particular questions or sequencing 
may indicate the need for some retraining of interviewers or amendments 
to instructions. For smaller surveys, this information may not be useful to 
the current survey but any subsequent surveys should benefit. 
Alternatively, conducting a pilot test before the full survey should enable 
this sort of information to be utilised. 


The extent of non-response can also be monitored during processing, 
especially with respect to partial non-response. Questionnaires that are 
returned incomplete can be extracted for follow-up action with as little 
delay to the processing schedule as possible. 


It is generally not possible for processing to improve the accuracy of 
data. At best, it may reduce some inconsistencies. It is important to 
incorporate adequate testing, training, documentation and evaluation at 
early stages to improve the quality of data. 


ABS + AN INTRODUCTION TO SAMPLE SURVEYS: A USER'S GUIDE - 1299.0- 1999 55 


CODING 


Classification systems 


CHECKING 


Unless all the questions on a questionnaire are ‘closed’ questions, some 
degree of coding is required before the survey data can be sent for 
punching or analysed. The appropriate codes should be devised before 
the questionnaires are processed, and are usually based on the results of 
pilot tests (see chapter 6 Survey testing). 


Most surveys are too large and complex to be analysed by checking 
questionnaires and counting responses, so the researcher cannot cope 
with the volume of answers. Surveys usually require a system by which 
the responses can be transferred onto a computer file for analysis. This 
system involves translating all responses into numerical codes. 


Coding consists of labelling the responses to questions in a unique and 
abbreviated way (using numerical codes) so as to facilitate data entry and 
manipulation. Codes should be formulated to be simple and easy, for 
example if question 1 has 4 responses then those four responses could 
be given the codes 1, 2, 3 and 4. 


The coding frame for most questions can be devised before the main 
interviewing begins. That is, the likely responses are obvious from 
previous similar surveys or through pilot testing allowing those responses 
and relevant codes to be printed on the questionnaire. An 'other answer' 
code is often added to the end of a coding frame with space for 
interviewers to write the answer. The standard instruction to interviewers 
in doubt about any precoded responses is that they should write the 
answers on the questionnaire in full so that they can be dealt with by a 
coder later. 


In some cases, however, the final coding has to be done after fieldwork 
as part of the data preparation task. The open-ended responses on each 
questionnaire have to be assigned numerical codes according to either 
coding frames formulated from pilot tests and dress rehearsals or certain 
standard coding systems such as the Australian Standard Geographical 
Classification—see below. 


A major function of the Australian Bureau of Statistics (ABS) is the 
development of statistical standards, classifications, and frameworks. 
These classifications cover a wide range of subjects. A list of some useful 
ABS classifications is presented in Appendix 4. 


ABS classifications should always be considered when designing questions 
for a questionnaire because they are easily coded and allow for 
comparison with data from ABS and other sources. The ABS can provide 
assistance to organisations planning to use ABS statistical classifications 
through its Statistical Consultancy Service—see contact details on the last 


page. 


During the data preparation stages of a survey the completed 
questionnaires have to be checked and transferred into a format suitable 
for analysis, whether it be clerical or computer analysis. The main 
functions involved in data preparation are coding and editing. 
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CHECKING continued 


DATA ENTRY 


Optical Mark Recognition 


In most surveys there is both clerical and computer editing. This 
inevitably leads to some double checking, but to dispense with one of 
these edits would allow more errors to pass through to the data analysis 
stage. 


The main purpose of clerical editing involves detecting and correcting 
errors made during enumeration. When carried out concurrently with 
interviewing, clerical editing can prevent errors by providing feedback to 
the interviewers on their mistakes. These mistakes could include writing 
down ambiguous answers, misunderstanding sequencing instructions or 
entering numerical answers incorrectly, e.g. 02.4 instead of 2.40. 


Clerical editing can save time and money but it should be carefully 
targeted to ensure that the most effective use of resources is made. 
Commonly, editing is targetted to those respondents that make the 
largest contribution to estimates, and/or those respondents which come 
from large units such as businesses with a large number of employees. 


Clerical editing should only be used to check for very obvious errors and 
in circumstances when information on the questionnaire may be needed 
to resolve a query. Comprehensive checking of questionnaires should be 
handled by computer. Checks that would be both tedious and 
uneconomic to handle clerically can be handled quickly, easily and more 
comprehensively by a computer edit. Further, the data preparation stages 
can produce errors made during clerical editing, coding or punching and 
a final check is necessary before the analysis to ensure the data is final or 
‘clean’. 


Up to this chapter, the questionnaire has been considered mainly as a 
means of communication between interviewer and respondent. Its other, 
and just as important, role is as a working document for coders and 
computer operators, and as a medium for the transfer of data on to a 
computer file. By using the questionnaire in this second role as a data 
input document we are removing the need for a separate data input 
form. This also removes a stage that could produce transcription errors. 


Designing a questionnaire so as to facilitate data entry is discussed in 
chapter 3. 


Optical Mark Recognition (OMR) is a form scanning method whereby 
responses are read into a computer without a keyboard. These responses 
are then transformed into code. Both the 1991 and 1996 Censuses of 
Population and Housing were ‘data captured’ in this manner. 


How it reads 

The OMR’s software is programmed to read predetermined areas of a 
page on which responses are ‘expected’ to appear. The page is fed into 
the machine and the ‘clock tracks’ printed down the side of each page 
tells the OMR when to switch on its bank of ‘read heads’ to capture a 
response. The area in which a mark is expected may be scanned up to 
seven times in one second to detect a mark. This is called ‘mark 
sensing’. 
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Intelligent Character 
Recognition 


Advantage of ICR over OMR 


EDITING 


How it reads continued 
The number of clock tracks printed down the page is limited only by the 
question responses that can be given. 


The paper forms are still required in later processing. 


Intelligent Character Recognition (ICR) is similar to OMR but a computer 
image of the whole page is taken (both sides). 


How it reads 

The page containing responses is fed into scanning equipment and an 
image (either black and white or colour) is taken of both sides of the 
page. The images are then put through recognition software to interpret 
the responses. Both tickbox and hand written responses in specific areas 
of the page are recognised. The ‘unrecognised’ responses are sent onto a 
‘repair’ stage of processing for amendment via human intervention. The 
responses, both text and tickbox, are reformatted into a code for later 
processing. 


The paper-based pages of a form are no longer required for further 
processing as this can all take place from the image. This is a huge 
advantage over OMR in that the paper forms can be stored offsite or 
destroyed freeing up space both within a building and on a workstation 
desk. The time and cost savings in not moving the pages physically from 
process to process around a building are quite substantial. 


The ICR or Optical Character Recognition (OCR) machine can read the 
tickbox responses better than is the case with OMR. It achieves this 
because it takes an image of the page and this process is better for mark 
recognition, therefore increasing data quality and overall throughput. 


Another advantage of ICR over OMR is the large amount of data that ICR 
can electronically capture from a page. The data are then input to an 
automatic coding program which saves on staff years and further 
increases quality. 


ICR/OCR is rapidly becoming the industry standard solution for the 
capture of form-based data. This is heavily supported by the amount of 
software available and an ever increasing interest in faster and more 
accurate data capture methods. 


It is important to note that even though we go through comprehensive 
editing processes errors may still occur. Editing can identify only 
noticeable errors; it can correct items of information wrongly given by 
respondents or wrongly transcribed by interviewers only when there are 
clues that point to the error and provide the solution. Thus the final 
computer file will not be error-free, but should be internally consistent. 


Records should only be transferred to the final computer file after they 
have passed through all the edit checks without a failure. A record that 
fails even one check should not be transferred. 
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EDITING continued 


Structure checks 


Range edits 


Sequencing checks 


Duplication and omissions 


SUMMARY 


Logic edits 


There are five main editing checks performed clerically or by computer 
that should be considered. 


These are undertaken to ensure that all the information sought has been 
provided. Clerically this involves checking that all documents for a record 
are together and correctly labelled. For example, a household 
questionnaire and a separate questionnaire for each person in the 
household may have to be identified and linked. The computer edit 
could involve a check to see that all entries are present in a record. 


The range of possible codes for each question is known. In some cases 
only codes 1 (yes) and 2 (no) are possible while in other cases 

codes 1-9 or more may be used. The edit function in this case (mainly 
computer edit) is to check that no code outside the valid range has been 
entered. 


For each question that should have been answered by only a subsection 
of the sample, two checks need to be made; first, that all those who 
should have answered the question (by virtue of a particular answer to 
an earlier question) have done so; second, that no-one else has. 


Editing problems can arise where responses have been omitted. If the 
omission is an isolated or minor one it can be dealt with by the insertion 
of a 'not answered' code unless of course, the answer can be deduced 
from other information on the questionnaire. 


The data entry process is another area that should be monitored closely. 
Once the data have been entered the records should be checked for 
duplication. This can happen frequently if the data entry process is not 
documented fully. 


In some cases precise logic checks can be specified in advance. For 
example a person under 15 cannot be in the full-time labour force or a 
male cannot be pregnant. In other cases scrutiny may be advisable to 
pick up unlikely occurrences such as a family of 18 people or a person 
over 15 at primary school. They may be genuine or they may be the 
result of a misunderstanding or incorrect transcription. Responses to 
other questions may be checked to see if there is a clue to the 
appropriate entry. Examples of other types of logical edits which can be 
used include the ratio of income to expenses (in a business survey), 
comparison with like data from similar or past surveys and checking of 
outliers (respondents who have reported values well outside the typical 
range of responses). 


The key points to address in developing an effective data processing 
strategy are: 


= target the editing effort to large contributors and large units within 
the survey population; 


a don’t over-edit; 


=» automate the editing process where feasible; and 
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SUMMARY continued = feedback information from the data processing stage to refine the 
conduct of the survey through changes such as improvements in 
question wording, questionnaire design, training and instructions. 


The data processing stage is an integral part of the sample survey. It is a 
stage that ensures an acceptable accuracy through process control, 
checking and editing. Coding and data entry techniques will also help 
ensure acceptable timing, and therefore costs, for the survey. 
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CHAPTER 10 


INTRODUCTION 


DESCRIPTIVE STATISTICS 


Measures of location 


ANALYSIS OF RESULTS 


This chapter provides a brief outline of some methods commonly used in 
analysing the results of a survey. Consultation with an experienced 
statistician is recommended before any analysis work is undertaken. 
(Some common statistical terms used in the analysis of results are 
explained in Appendix 1.) 


It is often desirable to have summary measures to indicate the location of 
a frequency distribution on a scale, for example a time scale. This helps 
the researcher build up a picture of the distribution and facilitates 
analysis. Summary measures also enable the comparison of frequency 
distribution before and after a specified event (e.g. number of car 
accidents before and after a change in traffic laws). A change can indicate 
a shift in the frequency distribution. 


Arithmetic mean 

The arithmetic mean is the most commonly used measure of location 
and is the ‘average’ of a set of sample values. The formula for calculating 
the mean of a set of values is: 


K= xX, +xX,+X,+...+X, 





n 
where x is the mean value 
X,,X,,X3,...X, are theobservations 


n is the number of observations in the sample 


Median 

In some frequency distributions the mean is not close to the 
concentration of values. In such cases the mean is not a good measure 
of the location of the distribution and the median which is more central, 
is generally used. The median of a set of values is the middle value when 
the values are sorted into order. (When there is an even number of 
values in the set the median is the arithmetic mean of the middle two 
values.) 


Mode 

The mode of a frequency distribution is the most frequently occurring 
value. In general the mean and median are better measures of location, 
however the mode is useful when the values are unevenly spread. 
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Measures of spread 


NORMAL DISTRIBUTION 


TABLES 


In summarising datasets it is also important to know the variability of the 
values (i.e. how spread out the values are). 


Range 

The range is the difference between the largest and smallest value. It is a 
common measure in industrial quality control and meteorology because 
of its ease of computation. However a disadvantage of the range is that it 
tends to increase as the sample size increases. In addition, it does not 
provide any information about the distribution of values. 


Variance 

The variance describes the spread of data around the arithmetic mean 
and is the average of the squared differences between the value of the 
variable (e.g. height of each person) in the sample and the mean height 
of the sample. The variance can be used to make a statistical inference 
about the population from which the sample is drawn. It is possible to 
have two samples with the same mean but different variances. A larger 
variance indicates that the data are more spread out about the mean. 


The sample variance is calculated by: 


nn 
2 5 2 
: =a) —*) 
i=1 


where n = number of observations in the sample 
x, = value of the ith observation in the sample 
x = arithmetic mean 


For many populations the distribution of values is a specific bell-shaped 
curve, called the normal curve. The normal distribution is the most 
useful in statistics as certain properties always hold and consequently the 
construction of confidence intervals (see chapter 4) is straightforward. 
Errors made in measuring physical and economic phenomena are often 
normally distributed. In addition, many other distributions can be 
approximated by the normal curve. In a normal distribution, 95% of the 
sample means lie within two standard deviations of either side of the 
population mean and 99% of the sample means lie within three standard 
deviations of either side of the population mean. 


Table production can give a researcher a clearer picture of what the data 
holds and can therefore help determine the type of statistical analysis 
that could be undertaken on the data. Tables can also be used to 
summarise the data enabling the reader to draw his/her own conclusions. 
Some examples of tables are given below. 








UNIVARIATE 

% 
Agree 59 
Disagree 41 
Total 100 





62 ABS - AN INTRODUCTION TO SAMPLE SURVEYS: A USER'S GUIDE + 1299.0 - 1999 


SUMMARY 


BIVARIATE 





Males Females 





% % 
Agree 57 61 
Disagree 43 39 
Total 100 100 





MULTIVARIATE 





Males Females 





Secondary Tertiary Secondary Tertiary 





% % % % 
Agree 54 60 56 66 
Disagree 46 40 44 34 
Total 100 100 100 100 





More complex forms of analysis are possible. For example, standardising 
for age and other variables when comparing data for different geographic 
areas may be relevant for health related variables. 


For more information on tables and other forms of data presentation see 
chapter 11. 


The quality of statistical analysis depends not only on using the most 
suitable statistical method but also on the quality of the data. Good 
quality data will only be achieved with adequate planning before the 
survey is conducted, particularly if hypothesis testing is expected to form 
part of the analysis. Again it should be noted that analysing survey results 
can be a complex task and it is recommended that an experienced 
statistician be consulted. The Australian Bureau of Statistics can provide 
advice on the analysis of results through its Statistical Consultancy 
Service—see contact details on the last page. 
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CHAPTER 11 


INTRODUCTION 


WRITTEN REPORT 


PRESENTATION 


Communicating the results of the survey to the people who 
commissioned the survey is the ultimate goal. A further important phase 
is involved—that of communicating the results in a clear and logical 
format, to the decision makers and information users. 


The format of presentation should be tailored to satisfy the users. 
Consideration should be given to the level of statistical understanding of 
the users, particularly in regard to statistical terminology. 


In presenting the data some form of written report or paper is essential. 
Such a report is more likely to be a success if it conveys a number of 
specific messages rather than a broad spread of generalised information. 
The report should follow a logical progression, give precise statements 
on the conclusions which have been reached and if appropriate, make 
recommendations for future action. It should clearly distinguish between 
verifiable facts, and opinion and interpretation. Careful consideration 
should be given to language and grammar as well as the design and 
layout (for example graphs and tables may add to the quality of the 
report). A preponderance of graphs and diagrams however may be just as 
confusing as many pages of text so it is important to strike a balance. A 
careful study of other reports may be helpful in planning the design. 


The contents of the report and its balance of words and tables will 
naturally depend on the subject, the conclusions and the likely readers. 
However, the following sections are generally included in a survey 
report: 


= Introduction: states the purpose and aims of the survey and the aims 
of the report; gives the background to the research; discusses previous 
relevant studies; defines terms and concepts; states whether the survey 
is testing an hypothesis or is exploratory. 


» Methodology: gives the method of sampling and information on the 
survey population as well as how the data were analysed and the 
statistical procedures which were used. 


= Findings and analysis: the main part of the report which deals with 
details of the sample numbers, response rate, etc. and discusses 
possible courses of action. 


=» Conclusions: summarises the major findings of the report. 


a Recommendations: states what actions are indicated on the basis of 
the conclusions. 


= References: lists the books, journals and papers referred to during the 
study. 


=» Appendixes: consists of items which may be useful to the reader 
(e.g. the questionnaire) but are not essential to the report. 
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WRITTEN REPORT continued 


STATISTICAL PRESENTATION 


Tables 


Graphs 


A synopsis which summarises the report into one page is often 
worthwhile. Remembering that the report is likely to serve as a basis for 
discussion, some other important considerations are the title, use of 
headings and subheadings, the colour and design of the cover and the 
overall appearance of the report—it should stimulate the reader’s 
interest. 


The form of presentation will depend on the data. Tables and graphs are 
the most common form of presentation but other types are available. In 
general, tables are more accurate—showing the actual values, whereas 
graphs are more useful in showing relationships—concentrating on form, 
shape and movement. Graphs are particularly useful in representing the 
change in the value of a data item over a period of time. 


Tables are the most common form of statistical presentation and should 
show both additional and necessary information which could not be 
conveyed in general text. A good table is one in which patterns and 
exceptions are obvious at a glance and may be further enhanced by a 
short paragraph commenting on a major feature of the data. Headings 
must be clear and give descriptions of items and classes. 


Some additional guidelines which may be helpful: 


=» rounding data to about four or five significant figures makes data 
easier to grasp and manipulate; 


=» reading down a column is easier than reading across a row, especially 
for a large number of items; 


=» row and column averages or percentages may help the reader 
interpret the data; 


= widely spaced columns are difficult to compare and should be 
avoided; and 


» totalling of rows and/or columns is usually helpful. 


A graph is a visually attractive way of presenting data. Although the 
amount of information which can be presented is limited, a graph is a 
useful adjunct to other forms of statistical presentation and can often 
reveal 'hidden' facts in complex data. Relationships and trends are more 
clearly grasped and therefore better remembered from a graph, rather 
than from a table or text. However, a graph is not always necessary and 
should not be included as a matter of course. 
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Constructing a graph 

a Titles: a title is essential and is best placed at the top. It should 
indicate the ‘what, where and when' of the graph as concisely as 
possible and should be larger than the lettering on the graph itself. 


= Scales: the horizontal scale usually measures the time unit and the 
vertical scale the variable under consideration. Where possible the 
vertical scale should begin at zero to avoid over-estimating the 
differences of the variable between time periods. When the data cover 
a wide range, a break in the 'amount' scale may be useful but only if 
this will not distort the overall picture. The time scale should never 
be broken. 


=» Other considerations: 
= avoid clutter. It can lead to misinterpretation; 
= both axes should be named with scales and units marked; 


= care should be exercised in plotting multiple variables in the one 
graph. Labels and/or a legend may complicate the graph 
unnecessarily; 


= footnotes may be used to explain unusual features, such as breaks 
in the series. 


Types of graphs 

The decision as to which type of graph is to be used will depend on the 
field of study, the characteristics of the data, cost, study objectives, likely 
readers and the authors’ expertise. 


The following sections give a brief description of the various sorts of 
graphs which may be used: 


Line or curve graph 

These may be used when the emphasis is on the movement rather than 
the size of the data item or when several series are being compared. This 
type of graph shows variations in the data plotted over a period of time. 
It is widely used and easy to construct but should not be used when 
categorical variables are being plotted, e.g. comparing numbers of goats, 
horses and sheep. 


Bar or column graph 

This type of graph depicts numerical values over a given variable, for 
example time. The value is represented by the height of the column. This 
type of graph is especially effective for showing large changes from one 
period to the next. The columns may or may not be connected. 
Connected columns are best used when there are many columns and 
spacing would appear crowded. 


Grouped columns/bars graph 
These are used to compare two or three different categories on the same 
graph. 
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OTHER FORMS OF 
PRESENTATION 


SUMMARY 


Semi-logarithmic graph 

This graph is useful when comparing two series with widely different 
arithmetic values. In such a case it is the relative change which is of 
greatest interest. In this type of graph the arithmetic values are converted 
to logarithmic values. This enables easier comparisons as two series with 
the same rate of change will be parallel to one another. 


Pie chart 

This graph gives a comparison (in percentage terms) of components with 
each other and with the whole. This type of graph can be used when 
there are a small number of categories, say seven or less, and when the 
composition of the population is of interest. If the actual number, rather 
than the share, is of interest, other types of graph are more suitable as 
the human eye can read lines more effectively than angles. 


Map 

A map can be used to show boundaries of local government areas, State 
electoral provinces, etc. Maps can also be used to display survey results 
in a spatial context, for example, concentrations of particular variables in 
local government areas. 


Depending on the circumstances (users, type of data, results), a written 
report may be inadequate or may need to be supplemented. Oral 
presentation of the results of a survey is often neglected as an important 
means of conveying information. Whereas a written report provides great 
detail with a wide range of results an oral session can only emphasise a 
few major points. However, this can often be most suitable depending 
on the audience. As with a written report poor presentation may cause 
the survey results to be rejected. The spoken word and visual aids can 
have a great impact on an audience. The presenter therefore should be 
aware of the nature of the audience and know what survey results may 
be contrary to existing ideas. 


A poster is one way to attract attention but only one direct statement 
should be made. The message must be noticeable at a glance and the 
poster itself must be attractive to encourage possible users to inspect it. 


A panel exhibit is an extension of the poster presentation. This type of 
presentation gives more details and expands several main ideas. Again it 
is important that the panels be colourful and attractive. 


The use of videos or television can provide an additional means of 
communication of the survey results. 


The presentation of data in some form of report is the culmination of 
many weeks, months or even years of work. Thoughtful use of text, 
tables and diagrams can provide a document that is understood and can 
be used. It will inform the reader by presenting all that needs to be 
known to make an informed decision. 
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APPENDIX 1 


Confidence 


Confidence interval 


Frequency distribution 


Data set 


Statistical tests 


t-test 


Chi-square test 


Analysis of variance 


Parameter 


STATISTICAL ANALYSIS TERMINOLOGY 


Sampling error allows researchers to express the accuracy of sampling 
statistics in terms of levels of confidence. So, the higher the sampling 
error, the less confident the researcher can be about the reliability of 
his/her predictions or estimates. 


A specified interval with the sample statistic at the centre which is 
expected to include the corresponding population value with a given 
level of confidence. 


Is the classification of the elements of a data set by a numerical 
characteristic. 


Data collected for a particular study is called a data set. A data set 
represents a collection of elements and for each element, information on 
one or more characteristics of interest is included. In the data set in the 
example below, the element is country and the characteristic is the birth 
rate. This data set is called a univariable data set as there is only one 
characteristic of interest. The birth rate is referred to as a variable of the 
data set as it takes on different values from country to country. 








Birth rate 
Country per 1 000 
Greece 15.7 
Italy 16.5 
Australia 12.7 
USA 11.9 





A Statistical test is designed to show (with a certain level of confidence) 
whether a sample estimate could have come from the population in 
question, or occurred due to 'chance'. 


The t-test or student t-test is usually used to test for significance of 
difference between estimated means. 


This test can be used to test certain properties of the frequencies 
observed in a sample. It can be used to test whether the observed 
distribution of frequencies differs from a predetermined theoretical 
distribution. 


Just as the t-test provides a way of testing for differences between two 
samples, analysis of variance is a technique which enables tests for 
differences to be made simultaneously on three or more samples. This 
type of analysis is often used in agricultural experiments and 
manufacturing. Certain assumptions about the population/sample must 
hold for analysis of variance to be useful. 


A parameter is the true value of a given variable in a population. For 
example the mean income of all families in a city is a parameter. An 
important part of survey research involves the estimation of population 
parameters on the basis of sample observations. 
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Statistic A statistic is the summary description of a given variable in a sample. 
Thus, the mean income computed from a sample survey is a statistic. 
Sample statistics are used to make estimates of population parameters. 


Sampling error Probability sampling methods seldom if ever provide statistics exactly 
equal to the parameters they are used to estimate. Probability theory 
permits us to estimate the degree of error to be expected for a given 
sample. 


Variables Characteristics such as sex, age, income and employment status are called 
variables as they can take on different values. Surveys often aim at 
describing the distribution of characteristics comprising a variable in a 
population. 
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APPENDIX 2 


COMMONWEALTH 
GOVERNMENT STATISTICAL 
CLEARING HOUSE 


WHICH SURVEYS NEED 
APPROVAL? 


WHEN TO CONTACT THE 
STATISTICAL CLEARING 
HOUSE 


STATISTICAL CLEARANCE PROCESS 


Data collection activities by, or on behalf of, Commonwealth government 
agencies are subject to a formal clearance process. 


In 1996, the Small Business Deregulation Task Force recommended that 
there should be a central clearance process for all Commonwealth 
government business surveys. Cabinet endorsed this recommendation, 
and during 1997 the Statistical Clearing House was established. 


The goal of the clearance process is to ensure that surveys impose 
minimum loads on business respondents, and that the information 
collected will be of sufficient quality to meet the objectives of the survey. 


The Statistical Clearing House mandate affects all Commonwealth 
government departments and agencies that conduct or commission 
surveys of businesses. 


The Statistical Clearing House operates from within the Australian Bureau 
of Statistics (ABS), as part of its statistical coordination role, but operates 
independently of the ABS’ own statistical collection activities. 


The Statistical Clearing House must review all surveys involving 50 or 
more businesses run by, or on behalf of, any Commonwealth agency. 
This includes ABS surveys. Surveys that satisfy the review criteria receive 
a registered approval number to be used on forms and explanatory 
material sent to businesses. 


The review process includes both voluntary and mandatory surveys, as 
well as surveys conducted by a consultant on behalf of the Government. 


The data collection phase of a survey must not begin until the survey is 
approved. 


Survey managers are encouraged to notify the Statistical Clearing House 
about their business surveys as early as possible in the development 
process, and to provide information as it becomes available. 


For example, information about survey administration and objectives can 
be submitted early in the process, while sample design and questionnaire 
details can be submitted later when they have been finalised. In this way 
the review can be done in stages, avoiding delays in the survey 
development timetable. 
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THE STATISTICAL CLEARING 
HOUSE REVIEW 


TURNAROUND TIME 


THE REGISTER OF SURVEYS 


FURTHER INFORMATION 


In general terms, the review process ensures that: 


= there is no adequate alternative source of information available and 
no reasonable, alternative means of obtaining the required information 
with less respondent burden; 


= the survey methodology is appropriate to meet the objectives of the 
survey, in that: 


= the list of businesses for the survey provides adequate coverage; 
= survey forms (questionnaires) have been appropriately tested; and 


=» the expected levels and quality of response are justified and 
sufficient for the intended uses of the results; 


= a group of businesses or business associations have been consulted 
about the nature and objectives of the survey and data availability, and 
there is an assessment of respondent burden; and 


= there are adequate systems (both computer and people-based) to 
ensure the survey is conducted and processed in a manner that will 
provide output of appropriate quality. 


The Statistical Clearing House provides advice and assistance at all stages 
of the review process. Specific requirements on the type of information 
required for the review can be found on the Internet, at 
http:/;www.sch.abs.gov.au 


The Statistical Clearing House guarantee to review a proposal for a 
survey within 20 working days of receipt of all the documentation 
required. Turnaround will be faster than this if notification is made in 
advance and if the required information is provided as it becomes 
available. 


The Commonwealth Register of Surveys of Businesses is available as a 
by-product of the review process. The register includes information about 
each survey that has been reviewed—a survey contact person, the survey 
objectives, and a description of the survey procedures. At the discretion 
of the survey manager, it may also include a link to the actual survey 
findings. 


The Statistical Clearing House can be contacted by: 


Email: statistical.clearing. house@abs.gov.au 
Telephone: Canberra 02 6252 5285 

Facsimile: Canberra 02 6252 8008 

Or visiting the web site at www.sch.abs.gov.au 
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APPENDIX 3 


INFORMATION PRIVACY 
PRINCIPLES 1-3 


SUMMARY OF INFORMATION 
PRIVACY PRINCIPLES 


Principle 1 


Principle 2 


Principle 3 


COLLECTING PERSONAL INFORMATION 


If you collect personal information, it is recommended that you be aware 
of the following. 


When Commonwealth government agencies collect personal information 
they are regulated by the Privacy Act 1998 (Cwlth). 


There are eleven Information Privacy Principles (IPPs) in the Privacy Act. 
IPPs 1-3 are concerned with how government agencies collect personal 
information. A summary of IPPs 1-3 is included below. 


Although organisations other than Commonwealth government agencies 
are not legally bound by these principles in the same way, it is suggested 
that the principles are worth consideration by all those collecting 
personal information. 


Agencies can only collect personal information: 


s for a lawful purpose that is directly related to their functions; and 


» if collecting the information is necessary for or directly related to that 
purpose. 


Agencies must not collect personal information unlawfully or unfairly. 


If an agency asks a person for personal information about himself or 
herself, it must tell the person: 


as why it is collecting the information; 
=» whether it has legal authority to collect the information; and 
=» to whom it usually gives that sort of information. 


When an agency asks for personal information, the agency must do its 
best to make sure that the information is: 


=» relevant to the agency’s reason for collecting it; 
= up-to-date; and 


=» complete. 
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APPENDIX 4 SELECTED ABS CLASSIFICATIONS 


AREA Australian Standard Geographical Classification 
Standard Australian Classification of Countries 


INDUSTRY Australian and New Zealand Standard Industrial Classification 


COMMODITY Australian and New Zealand Standard Commodity Classification 
Harmonized Commodity Description and Coding system 


INSTITUTIONAL SECTOR Standard Economic Sector Classification of Australia 
RESEARCH Australian Standard Research Classification 

OCCUPATION Australian Standard Classification of Occupations 

CRIME Australian Standard Offence Classification 

LANGUAGES Australian Standard Classification of Languages 

RELIGION Australian Standard Classification of Religious Groups 
QUALIFICATIONS Australian Bureau of Statistics Classification of Qualifications 


More detailed descriptions of these classifications and a more 
comprehensive list of ABS classifications can be found in the ABS 
publication ‘A Guide to Major ABS Classifications, 1998 

(Cat. no. 1291.0). 
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SELF-HELP ACCESS TO STATISTICS 


DIAL-A-STATISTIC For current and historical Consumer Price Index data, 


INTERNET 


LIBRARY 


call 1902 981 O74. 

For the latest figures for National Accounts, Balance of 
Payments, Labour Force, Average Weekly Earnings, 
Estimated Resident Population and the Consumer Price 
Index call 1900 986 400. 

These calls cost 75c per minute. 


Wwww.abs.gov.au 


A range of ABS publications is available from public and 
tertiary libraries Australia wide. Contact your nearest library 
to determine whether it has the ABS statistics you require. 


WHY NOT SUBSCRIBE? 


PHONE 


FAX 


+61 1300 366 323 


+61 3 9615 7848 


CONSULTANCY SERVICES 


POST 


EMAIL 


ABS offers consultancy services on a user pays basis to 
help you access published and unpublished data. Data that 
are already published and can be provided within 

5 minutes is free of charge. Statistical methodological 
services are also available. Please contact: 


City By phone By fax 

Canberra 02 6252 6627 02 6207 0282 
Sydney 02 9268 4611 02 9268 4668 
Melbourne 03 9615 7755 03 9615 7798 
Brisbane O7 3222 6351 O07 3222 6283 
Perth 08 9360 5140 08 9360 5955 
Adelaide 08 8237 7400 08 8237 7566 
Hobart 03 6222 5800 03 6222 5995 
Darwin 08 8943 2111 08 8981 1218 


Client Services, ABS, PO Box 10, Belconnen ACT 2616 


client.services@abs.gov.au 


